Extract and relate Human Sentiments using Text Mining : A Bayesian Learning
Abstract
A huge amount of text data is available through different sources across the internet
where the users write reviews relating to the product and service related features. The text representation is a function of the inherent topics which reflects the reviewers
perception. The aim is therefore to address the challenge to discover the hidden sentiments from these reviews and classify them as well predict the performance of the future reviews through rank ordering between and within the positive and negative
sentiments. Addition to these latent structures the data also depends on a variety of
nuisance parameters that are irrelevant to the task, which includes the unknown
characteristics of the medium/platform through which the reviews are recorded and their
interplay with the topic distributions across the documents containing the reviews. Therefore the central theme of this paper is to explore the design and analysis of
text representations to unveil the patterns of hidden sentiments from the corpus.
The methodology adopted is Bayesian Aspect Mining of Text data using the concept of
mixture of Stick Breaking Processes Representation thereby leading to Hierarchical
Aspect Sentiment Modeling through the technique of Dirichlet Processes embedded with
a recursive Chinese Restaurant Process. The approach uses nested stick-breaking
processes to allow for trees of unbounded width and depth, where data can live at any
node and are infinitely exchangeable. One can view the model as providing infinite
mixtures where the components have a dependency structure corresponding to an
evolutionary diffusion down a tree. By using a stick-breaking approach, Markov chain
Monte Carlo methods can be applied based on slice sampling to perform Bayesian
inference and simulate from the posterior distribution on trees. This can be fairly extended to infinitely exchangeable mixture processes. Optimal Representation of the Text Data can then be explored and analyzed through the use of Sufficient Statistics, exploiting the structure of Maximal Invariance in the above set up. The concept of Maximal Invariance Mappings defined on the trees can exploit the nested structures to get rid of the nuisance parameters and hence making the MCMC algorithms to converge faster.