A text-based analysis of 10-K management discussion and analysis (MD&A) disclosures shows that fraudulent firms produce abnormal verbal disclosures that can be used to aid in the identification of fraudulent practices. Fraudulent managers tend to disclose fewer details about sources of performance, overemphasize positive aspects of firm performance, and provide less information on the managerial team itself.
How Is This Research Useful to Practitioners?
The authors’ findings will be of primary interest to investors, analysts, and regulators. Using text-based analysis, the authors review management discussion and analysis (MD&A) disclosures in 10-Ks in an attempt to identify fraud. Most fraud allegations are based on manipulations of revenue and/or expenses. The authors focus on the MD&A because it is where managers are required to discuss both revenue and expenses as part of their discussion of annual performance.
When the SEC decides to formally investigate a firm, it issues an Accounting and Auditing Enforcement Release (AAER), which the authors use to identify fraud. Although the rate of detected fraud is generally low—approximately 1% a year, on average, over the sample period—the authors suggest that the actual rate of fraud (including undetected cases) may be substantially higher than the observed rate.
They present three key findings of particular relevance to investors. First, firms committing fraud tend to broadly underdisclose details that explain their performance. Second, fraudulent firms grandstand or overemphasize a topic that touts their revenue growth. Third, fraudulent managers are less likely to be self-referential in MD&A reporting when attributing performance in order to insulate themselves from any potential fallout from the fraudulent activity.
The authors also find that fraudulent firms underdisclose issues related to firm liquidity, that incidents of fraud are more likely after periods of poor market liquidity, and that managers involved in expense fraud tend to overdiscuss R&D-related expenses. They find no link between the use of more complex disclosure text and an uptick in fraudulent reporting activity.
How Did the Authors Conduct This Research?
The sample period runs from 1997 (the first full year of electronic 10-K filings) to 2010. The authors’ primary data sources are Compustat, text in the MD&A section of firms’ annual 10-Ks (extracted using software from meta Heuristica LLC), and AAER data from the Center for Financial Reporting and Management. The authors include all companies in the CRSP database with positive sales, with at least $1 million in assets, and with non-missing operating income. After excluding financial firms that have unique disclosures, the result is a sample of more than 68,000 observations. The authors use standard controls for such variables as firm size, age, and industry.
To conduct their research, the authors use latent Dirichlet allocation (Journal of Machine Learning Research2003), or LDA. LDA is based on the idea that documents can be represented by a set of topics. It uses computational linguistics, which allows groups of observations to be categorized. The authors use LDA to identify how verbal topics of firms involved in AAERs compare with those of peers not involved in AAERs.
In using LDA, the authors rely on two approaches: The first is a list of the most frequent key phrases associated with each topic, and the second is a representative nonfraudulent paragraph that can be used to determine whether fraudulent firms are underreporting on a particular topic. The benefits of these two approaches are that the calculations require few inputs, are fully automated and replicable, and cannot be influenced by researcher prejudice. Other methods would involve more arbitrary choices by researchers.
Abstractor’s Viewpoint
Fraudulent activity is the fear of all investment managers, analysts, and regulators. Findings that can assist in the detection of companies that may be more likely to engage in suspicious activity are a welcome addition to the literature. Although the incidence of fraudulent activity after identifying these common traits will remain relatively low, knowing what to look for gives practitioners and regulators a heightened awareness when reviewing corporate financial results.
In their conclusion, the authors highlight a 2% difference in the incidence of fraud between the lowest-decile and the highest-decile groupings (0.4% versus 2.4%). Although this difference is statistically significant, it is probably not enough to be of any practical benefit to most investors. When combined with prior research, however, their findings become more effective. The inclusion of the authors’ fraud score yields an average improvement of 25% in the success rate and can increase the overall likelihood of identification approximately fivefold relative to the broader universe. It would be interesting to see them expand their approach beyond MD&A to other types of disclosure to further improve accuracy.
Combining this work with other authors’ findings increases its relevancy, making it more useful to regulators and investors in identifying potential red flags. This approach, however, is still likely to produce a frustratingly high percentage of false positives for many users of the data.