Please use this identifier to cite or link to this item:
http://hdl.handle.net/11718/25617
Title: | Real-time analytics for intelligent systems |
Authors: | Verma, Shikha |
Keywords: | Big data;Intelligent Systems;Real-time analytics |
Issue Date: | 17-Nov-2021 |
Publisher: | Indian Institute of Management Ahmedabad |
Series/Report no.: | TH;2022-12 |
Abstract: | Data streams are defined as a sequence of observations arriving continuously at a fast pace. They pose unique computational challenges viz: Single-pass at incoming observations, huge storage requirements, and accounting for concept drift. Concept drift is a phenomenon where characteristics of data evolve over time. Concept drift renders the models built in conventional setup outdated for predictions on current data. Predictive machine learning methods are supposed to account for these challenges while processing data streams that have become ubiquitous due to the pervasive presence of sensors in the Internet of Things era. The prevalence of information and communication technologies for pervasive sensor data collection, a rapid decrease in data storage cost, and pervasive availability of computing power enables the analysis of “big data” for monitoring, planning, and operational purposes. The domain of ‘Intelligent Systems’ involves the use of advancements in communication and computation technologies to address challenges in data-driven systems. This leads to the production of high-velocity, information-rich data streams. These streams operate in dynamic environments and do not meet the requirements of a (time) stationary distribution which is often an important requirement for analysis of temporal data. In this dissertation, we aim to develop new prediction methodologies for data streams from sensors with applications in the transportation and human activity recognition domains. We focus on the following problems: 1. Dynamic Concept Drift Detection in Data Streams with Limited Labeling 2. Dynamic Demand Forecasting in Bikeshare Networks For the first problem in a classification context, we use optimal transport theory to develop a novel algorithm for detecting concept drift in partially labeled data streams. We develop a summariza-tion measure to reduce the storage requirements of a data stream. We demonstrate the performance of the algorithm on synthetic benchmark datasets and real datasets containing sensor observations from the transportation domain. This approach can help transportation researchers develop adaptive systems for safer driving with minimal user feedback. It can also aid transportation planners in assessing changes in mobility preferences of a population using sensor data. The key contributions of this approach are that in addition to developing a novel algorithm for drift detection, we also propose of a data-driven approach for estimation of threshold that critically determines the performance of a drift detection algorithm. As accuracy alone is an unsuitable metric for comparing drift detection algorithms in limited labeling setups, we propose a novel measure that accounts for the predictive performance and the labeling requirements of a method. For the first problem in general predictive contexts, we develop a novel algorithm for detecting concept drift in partially labeled data streams using theory from symbolic data. We devise a novel drift detection metric using theory from symbolic data analysis and statistical learning. We demonstrate the performance of the proposed algorithm on synthetic and real-life human activity recognition dataset. It can be applied to aid assisted living for the elderly where a drift detected in real-time could help update the predictive system to detect falls and injuries more accurately. The key contribution of this method is the development of a novel drift detection metric that is more sensitive to drifts in features with more predictive power, thus improving upon existing drift tracking metrics that are equally receptive to drifts in all features. This method is applicable for both regression and classification problems. For the second problem, we focus on demand forecasting in a bike-share system. We devise an algorithm that uses spatial clustering to reduce the high-dimensionality of the problem, followed by building time series models in streaming setup. An accurate forecast helps can help the bikeshare authorities to achieve timely rebalancing across stations to meet demand effectively. The key contributions of this method is the development of light-weight models for bike demand forecasting that are more suitable for edge computing environments with limited computing power as compared to deep learning models with high computational overheads. The key contributions of this dissertation are development of new algorithms with a demonstrated applicability in real world problems. We also offer insights into the choosing the right algorithm based on the application context. The findings of this research contribute to domains of streaming data, transportation, human activity recognition and sensor data analysis. |
URI: | http://hdl.handle.net/11718/25617 |
Appears in Collections: | Thesis and Dissertations |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Shikha_Verma_Thesis.pdf Restricted Access | 11.68 MB | Adobe PDF | View/Open Request a copy |
Items in IIMA Institutional Repository are protected by copyright, with all rights reserved, unless otherwise indicated.