Please use this identifier to cite or link to this item: http://hdl.handle.net/11718/14070
Title: Extract and relate Human Sentiments using Text Mining : A Bayesian Learning
Authors: Banerjee, Chandramouli
Keywords: Exchangeable Processes;Infinite Divisibility;Stick Breaking Dirichlet Hierarchies;Sentiment Analysis;Topic Modeling to extract Sentiments
Issue Date: 2015
Publisher: Indian Institute of Management, Ahmedabad
Citation: Banerjee, C.. (2015). Extract and relate Human Sentiments using Text Mining : A Bayesian Learning. 4th IIMA International Conference on Advanced Data Analysis, Business Analytics and Intelligence. Indian Institute of Management, Ahmedabad
Series/Report no.: IC 15;128
Abstract: A huge amount of text data is available through different sources across the internet where the users write reviews relating to the product and service related features. The text representation is a function of the inherent topics which reflects the reviewers perception. The aim is therefore to address the challenge to discover the hidden sentiments from these reviews and classify them as well predict the performance of the future reviews through rank ordering between and within the positive and negative sentiments. Addition to these latent structures the data also depends on a variety of nuisance parameters that are irrelevant to the task, which includes the unknown characteristics of the medium/platform through which the reviews are recorded and their interplay with the topic distributions across the documents containing the reviews. Therefore the central theme of this paper is to explore the design and analysis of text representations to unveil the patterns of hidden sentiments from the corpus. The methodology adopted is Bayesian Aspect Mining of Text data using the concept of mixture of Stick Breaking Processes Representation thereby leading to Hierarchical Aspect Sentiment Modeling through the technique of Dirichlet Processes embedded with a recursive Chinese Restaurant Process. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view the model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, Markov chain Monte Carlo methods can be applied based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. This can be fairly extended to infinitely exchangeable mixture processes. Optimal Representation of the Text Data can then be explored and analyzed through the use of Sufficient Statistics, exploiting the structure of Maximal Invariance in the above set up. The concept of Maximal Invariance Mappings defined on the trees can exploit the nested structures to get rid of the nuisance parameters and hence making the MCMC algorithms to converge faster.
URI: http://hdl.handle.net/11718/14070
Appears in Collections:4th IIMA International Conference on Advanced Data Analysis, Business Analytics and Intelligence

Files in This Item:
File Description SizeFormat 
IC 15-128.pdf
  Restricted Access
544.53 kBAdobe PDFView/Open Request a copy


Items in IIMA Institutional Repository are protected by copyright, with all rights reserved, unless otherwise indicated.