Bridge over ocean
1 July 2015 CFA Institute Journal Review

The Use of Word Lists in Textual Analysis (Digest Summary)

  1. Daniel J. Larocco

Textual analysis is the process of analyzing the content of a document in order to assess its tone. After analyzing more than 77,000 10-Ks, the authors conclude that Diction (the software platform used in prior literature) is inappropriate for this purpose. They argue that the Loughran–McDonald dictionary is superior to Diction for capturing the tone of business news releases.

What’s Inside?

In the context of financial research, textual analysis is applied to the news releases of businesses (such as annual reports and earnings announcements) to assess whether their tone is positive, negative, or other. Although the software platform Diction was not designed specifically for this purpose, prior literature has relied on that dictionary to conduct such analyses. The authors argue that Diction is inappropriate for the purpose of analyzing business news releases and propose the Loughran–McDonald (LM) dictionary as an alternative. They test this hypothesis by examining each dictionary’s ability to predict future stock price volatility.

How Is This Research Useful to Practitioners?

The authors demonstrate that the optimistic and pessimistic words identified in Diction are not appropriate in relation to business. For example, such words as vice, not, gross, and none may generally be considered negative words but are not interpreted in the same way in 10-K reports. For the 10-K reports included in the sample, the authors find that using the LM dictionary provides a statistically significant ability to predict future stock price volatility. In general, the ability to predict with confidence changes in future stock price variables will always be of value to practitioners. In particular, practitioners who seek to benefit from changes in stock price volatility may benefit from a careful consideration of this research.

How Did the Authors Conduct This Research?

The authors gather a sample of 77,158 10-K reports from 1994 to 2012 that are drawn from the SEC’s EDGAR database. They construct a vocabulary of optimistic and pessimistic words based on the Diction dictionary. Using this construct, the authors create the variable “% Diction optimism,” which is the percentage of words in the 10-K that Diction defines as optimistic. A comparable variable is created for Diction’s pessimistic words. They also create comparison variables based on the LM dictionary, as well as a tone variable based on the difference in the percentage of LM positive and negative vocabularies.

Overall, the Diction vocabulary yields a higher percentage of optimistic words and a lower percentage of pessimistic words than the LM vocabulary. But both exhibit a negative tone. Interestingly, the research indicates a slight negative correlation between Diction optimism and LM positive variables, suggesting that the two vocabularies are not capturing the same sentiment. In contrast, the negative variables are highly positively correlated.

To test the effectiveness of these two alternative vocabularies, the authors conduct a regression analysis to determine the extent to which each of the approaches predicts stock return volatility after 10-K filings. Although both negative variables have a high degree of correlation with future stock volatility, only LM tone has statistically significant correlation with volatility. The authors, therefore, conclude that the LM word lists are superior for assessing tone in business documents.

Abstractor’s Viewpoint

The form 10-K has changed over time, and the use of words can be altered with time. Although this research provides an interesting and potentially useful tool to practitioners, they should remain mindful of the fact that the predictive power of this research may not hold out of sample.

We’re using cookies, but you can turn them off in Privacy Settings.  Otherwise, you are agreeing to our use of cookies.  Accepting cookies does not mean that we are collecting personal data. Learn more in our Privacy Policy.