You are now in the main content area

Archive - July 2018

News

July 18, 2018

New TRSITM Faculty Member, Dr. M. Kargar publishes research results in top-tier journal.

Congratulations to Dr. Kargar on becoming the first Ryerson Faculty Member to have a paper published in the the  prestigious VLDBJ (The International Journal on Very Large Data Bases). His paper entitled "Effective and Complete Discovery of Bidirectional Order Dependencies via Set-based Axioms" was published in June 2018 and is the first submission from Canada for the year.

Abstract

In business intelligence and analytics, as well as in data management, integrity constraints (ICs) ensure accuracy and consistency of data in a business database. Formulating ICs manually, however, requires domain expertise, is prone to human error, and can be exceedingly time-consuming in the big data era. Thus, methods for automatic discovery have been developed for some classes of ICs, such as functional dependencies (FDs), and recently, order dependencies (ODs). An FD is a constraint between two sets of attributes in a database. For example, for a given postal code, city could be determined. Thus, postal code determines city. However, if two entities with the same postal code have different cities, the input data is erroneous and must be cleaned before using in any business analytics process. ODs properly subsume FDs and can express business rules involving order; e.g., an employee who pays higher taxes has a higher salary than another employee. If for two given employees, one has higher salary and pays less tax, the input data contains error. We address the limitations of prior work on automatic OD discovery, which has factorial complexity, is incomplete, and is not concise. We present an efficient bidirectional OD discovery algorithm enabled by a novel polynomial mapping to a canonical form, and a sound and complete set of axioms for canonical bidirectional ODs to prune the search space. Our algorithm has exponential worst-case time complexity in the number of attributes and linear complexity in the number of tuples. We prove that it produces a complete and minimal set of bidirectional ODs, and we experimentally show orders of magnitude performance improvements over the prior state-of-the-art methodologies. For example, in one of the experiments, and over a real dataset, our proposed algorithm found a set of ODs in about 1 second while previous work did not terminate after 5 hours.

The results of this research have been published in the prestigious VLDBJ (The International Journal on Very Large Data Bases) which is a top-tier journal in the field of data management and information systems. This is a collaborative research that involves Jaroslaw Szlichta (University of Ontario), Parke Godfrey (York University), Lukasz Golab (University of Waterloo), Mehdi Kargar (Ted Rogers School of Management at Toronto Metropolitan University), and Divesh Srivastava (AT&T Labs-Research).

Read the paper online (external link) 

By Month

Loading Icon