Streaming data: New models and methods with applications in the transportation industry
Streaming data or data streams can be defined as an infinite and continuous inflow of data coming at a very high pace. Streaming Data comes with its set of challenges viz. one-time pass, infinite data, a very high speed of data accumulation, limitations of memory and Concept Drift i.e. change in the distribution of incoming data. Due to these unique challenges of Streaming data analysis, conventional batch processing methods are not effective and there is a need for new methodologies. A GPS enabled transport dispatch system is an excellent source of geospatial data streams. Although there is extensive work done on GPS data mining as given in the literature but most of the methods developed are batch learning methods which neither takes into account the continuous nature of GPS data nor the inherent concept drift. In this dissertation, we aim to develop new methodologies for Geospatial data streams with applications in the transportation industry. We first look at the destination / next pickup location prediction problem in a streaming data context where we proposed four new methods, which to the best of our knowledge have never been applied before. The performances of these methods on several large datasets are evaluated using suitably chosen metrics. The next pickup location problem is also considered and the aforementioned methods are examined for their suitability using real world datasets. We proposed a new incremental learning approach for our second problem i.e. the travel time prediction problem for taxi GPS data streams in different scenarios and compare the same with four other existing methods. An extensive performance evaluation was carried out using four real-life datasets. Moreover, we also evaluate the performances of the above-mentioned methods for the continuous prediction of remaining travel time and the continuous updating of total travel time along the trajectory of a trip. Finally, for our third problem i.e. for the taxi demand hotspots detection problem in the streaming data context, we developed an incremental learner using a spatial point process approach, which hasn’t been attempted in the literature before to the best of our knowledge. We built models across different day of the week and time of the day to capture the changing hotspots and the model validation was provided through visualization. The key contributions of this dissertation are in development of new solutions to the above-mentioned research problems that have applications in the intelligent transportation industry and also in development of a framework for selection of suitable methods for different scenarios. The findings of this research have significant managerial implications.
- Thesis and Dissertations