Change point, prediction and classification with functional data
Abstract
Functional data consists of a collection of curves (or functions) defined on a finite subset of some interval. In this dissertation, we discuss change point, prediction and classification problems with functional data.
In change point problem, assuming that the functional data are random sample paths coming from an underlying Gaussian Process, we have introduced a new method for change point detection based on generalized likelihood ratio test. The covariance function used is a suitably modified version of the powered exponential covariance function to accommodate the correlation between different seasons of the year. The generalized likelihood ratio test statistic is derived in this functional setup. This method is applied for detecting the presence of change point in the temperature record of nine Indian cities available for the period, 1961-2013. Further, we have explored in detail, the relation of the magnitude of temperature change with the geographical location of the cities. We found that there has been a rise in the average temperature for all cities except one during this period. The magnitude of warming is found to be not uniform and varying across the cities. The cities located in higher altitudes are seen to have warmed more than those located in the plains and warming has occurred more in the winter season. The estimated change points for most of the cities lie within the period 1994 - 2001. The findings suggest that immediate policy measures may be required to ensure that no further warming happens in these cities.
In the prediction problem, we propose new methods for predicting functional data assuming as before that the data are random sample paths of an underlying Gaussian Process. We propose two new predictors called CE-Predictor and k-NN Predictor for such data. When the data is a mixture of two Gaussian processes, we additionally propose two new predictors: KM-Predictor and FC-Predictor. These methods can be applied to forecast the remaining path of a partially observed curve. We apply our methods to three real life datasets, namely growth curve of girls, annual temperature of Ahmedabad city and railway booking position curves. It appears that the KM-Predictor performs quite well for all these data sets.
In the classification problem with functional data, we provide a comparison of five classifiers of which three are depth based, one is neighborhood based and the last one is centroid based through various experiments. We experiment with both balanced and unbalanced data as well as with equally spaced and unequally spaced data to check the robustness of classification performance of these methods. It appears that the method based on Fraiman and Muniz depth performs best in most of the experiments followed by the method based on h-modal depth.
Collections
- Thesis and Dissertations [470]