Text mining for financial and non-financial information from SEC filings, and textual analysis for predictive models and risk assessment
Abstract
The recent developments in information technology in terms of text mining tools and search engines like Google is not only changing the way we gather information and conduct research, but also changing the kinds of questions researchers are asking and answering. Along with these technological developments, individual researchers’ programming skill is also making a difference in terms of how and what kinds of information one can gather. Textual analysis is becoming more popular in accounting and financial research. For example, Loughran and Mcdonald (2011) performed textual analysis of a large sample of annual reports (10-Ks) of US public companies for 1994-2008, and demonstrated a link between their word lists under various categories (positive, uncertain, litigious, strong modal and weak modal) to 10-K filing returns, trading volume, subsequent return volatility, fraud, material weakness, and unexpected earnings. Recently, Liu and Moffitt (2016) conducted a textual analysis of SEC Comments Letters and developed a measure of intensity based on the modality of comment letters and observed that the intensity of comment letters is positively associated with the probability of a restatement of the reviewed 10-K filings. Moreover, textual analysis and text mining techniques provide information about companies’ performance that is not available otherwise. Elaborating the value of textual analysis, Li (2010, p. 144) states “As a communication vehicle for management, textual disclosures can provide a means for researchers to assess managers’ behavioral biases and understand firm behavior.” Tetlock, Saar-Tsechansky, and Macskassy (2008) examine the use of a simple quantitative measure of language to predict individual firms’ accounting earnings and stock returns. Lee, Churyk and Clinton (2013) develop a fraud detection model based on textual analysis. They state that “Conventional fraud detection measures using ratio analysis and other financial data were either unable to detect the fraud or unable to detect it soon enough to avoid catastrophic outcomes”. Li, Lundholm, and Minnis (2013) develop a model of management's perception of the intensity of competition using textual analysis of firms’ 10-K filings. Developing individual expertise in programming, especially in Perl and Python is time consuming and, in fact, it is a waste of time and resources. An intelligent search engine, SeekiNF (https://www.seekedgar.com), a Cloud based technology, developed at The University of Kansas, provides incredible set of tools to gather financial and non- financial information from SEC filings, perform textual analysis with its built-in features, and develop analytical predictive models for assessing risks such as financial risk, litigation risk, fraud risk, etc. Currently, SeekiNF provides access to 17 million US SEC filings and 33 million documents and provides searched information in a matter of seconds using the Cloud technology. The presentation will focus on a live demonstration of the features of SeekiNF for gathering information by querying the system.
Collections
- R & P Seminar [209]