Query Rewriting Infrastructure for Big Data Exploration
T2015-314 Middleware layer that queries in the context of sampling and improves efficiency of common statistical analyses.
The rise of data-driven approaches to business decision making has led to advances in data querying. However, quantity of data collected has been increasing more rapidly than the user's ability to develop queries for efficient data analysis. For example, the calculation of variance is a necessary step in sampling-based querying, but is currently computationally expensive. Additionally, queries are normally considered in isolation, which increases response times. Therefore, methods to increase query response times and improve the efficiency of statistical calculations are needed to improve business analytics.
Researchers at The Ohio State University led by Dr. Arnab Nandi have developed a middleware layer that sits between Business Intelligence Visualization tool and the backend database. This layer rewrites queries in the context of sampling, which is a necessary step for large-scale data analytics. This technology leverages three core ideas: (1) user driven data exploration can be considered as a session of relation queries and not just in isolation; (2) calculation of VARIANCE () by reusing from previous computations in the session; and (3) queries can be speculatively executed and cached to improve response times.
- Data Analytics
- Predicts and executes a set of speculative queries so the next user query can be answered faster
- Quantifies error from sampling datasets, improving statistical accuracy and speed