Fraud and Deception Detection: Five Language Fingerprints

Jason Voss

Enterprising Investor Industry Future Hero Image

THEME: INDUSTRY FUTURE

11 March 2021 Enterprising Investor Blog

Fraud and Deception Detection: Five Language Fingerprints

Jason Voss, CFA

Last month, I described how computer-aided text-based analysis can help uncover fraud and deception in company communications. But what other insights can we glean from this research into scandal companies?

We used Deception And Truth Analysis (D.A.T.A.) to examine 10 of the largest corporate scandals in recent history and found that the average lead time between our textual identification of deception and the public recognition of possible scandal was more than six years.

Corporate Scandals: Time between Textual Evidence and Public Recognition

Ticker	Company	Size, in US Millions	Scandal Year	Average Alert Score in Lead-Up	Average Alert Score Pre-Scandal	Years Warning
ACC	Adelphia	$2,300	2002	-46%	-44.8%	2
AIG	AIG	$3,900	2005	-30.6%	-52.4%	12
CUC	Cendant	$640	1998	-37.9%	-48.8%	3
ENRN	Enron	$74,000	2001	-87.4%	-76.3%	8
HLS	HealthSouth	$1,400	2003	-42.2	-27.1%	9
LEH	Lehman Bros.	$50,000	2008	-37.2%	-3.8%	13
SAY	Satyam	$1,400	2009	-28.9%	-38.4%	6
TYC	Tyco International	$600	2002	-77.1%	-81.7%	7
WCOM	WorldCom	$3,800	2001	-33.9%	-47.9%	4
WM	Waste Management	$6,000	1997	-39.4%	-41.1%	2
	Total	$144,290		Average	-40.3%	6.6

The obvious question is why. Why does it take regulators and markets so long to recognize these scandals? And a follow-up question: What insights from text-based analysis can we use to better identify these scandals earlier? Let's take these in turn.

Theory: It's the Behavior

Why does D.A.T.A. detect deception faster than acutely interested investors and regulators? After thinking about this for a while, we developed a theory, and it boils down to 86.5%. That is the percentage of financial information that is expressed in text, not in numbers, in annual reports. Text communications reveal the behavior of corporate management teams, and that behavior leads to the outcome that is expressed in numerical performance.

So that 6.6 years between the initial indication of deception and when the scandal breaks is the average length of time that a poorly behaving firm can fake it, until they just can't massage the numbers any longer.

What is interesting is that the two scandals that took over a decade to recognize both involved financial companies: AIG and Lehman Brothers. Their annual reports ran in the hundreds of pages, and the velocity of money cycling through their balance sheets and income and cash flow statements was very, very high. Thus, it took considerable time for their poor behaviors and choices — the inputs — to eventually show up in the numbers, or the outputs.

If this theory is a valid explanation for that lead time, then scandal ought to have language fingerprints that investors can dust for as either an early warning system or as a second opinion on the normal fundamental work that investment research teams conduct.

Financial Analysts Journal Current Issue Tile

Language that Reveals Possible Scandal

After examining the 10 scandals above as well as Wirecard and other more recent controversies, we identified five textual fingerprints that differ from those of more truthful companies by more than 50%.

Scandal Words and Company Communications

Language Fingerprint	Incidence Relative to the Mean
Words Indicating Friendship	+56.1%
Words Indicating Risk	+55.9%
Impersonal Pronouns	+54.1%
Words That Indicate Differences	-53.6%
Words That Negate a Statement	+50.4%

In addition to text-based analysis, we also conducted one-on-one conversations to better discern between deception and truth and to identify some of the more pan-cultural deceptive behaviors people engage in. Our findings aligned with what previous lie detection researchers had uncovered: that each of the five potential deception indicators that surface in text-based analysis also occur in person-to-person interviews.

So let’s drill a bit deeper into each of them.

1. Words Indicating Friendship

Lie detection researchers have shown that deceivers often employ obfuscation to create confusion. One way they do this is by using words that imply friendship more often than the norm in business communications. Deceptive companies employ such terms 56.1% more than the average, according to our analysis. So if an annual report includes a number of ingratiating terms, it may be evidence of obfuscation and deception.

But a distinction is crucial here: Words that indicate friendship — “friend,” “pal,” “neighbor,” and “gang,” for example — are different from friendly words.

2. Risky Words

Scandal firms favor words that indicate risk at a much higher proportion than the average company. These include such terms as “averse,” “avoid,” “concern,” “difficulty,” “prevent,” “stopped,” and so on. These types of words already tend to raise securities researchers’ hackles, and as we pointed out in the last piece, firms are proactively excising these kinds of “red flag” words from their annual reports.

3. Impersonal Pronouns

“Another,” “everybody,” “someone,” and “whichever” are the sort of impersonal pronouns that dishonest firms employ to a much greater extent — 54.1% more often — than their truthful peers. Why do they prefer to be impersonal in their communications? Researchers theorize that they are trying to create emotional space between themselves and those they wish to mislead.

4. Words That Indicate Difference

Lying is cognitively demanding. One manifestation of this is that during the act of deception, the liar is often unable to make distinctions among competing points of view in their communications and so are less likely to draw comparisons. So the use of words that suggest difference is actually an indication of truthfulness. Constructions that present contrasting viewpoints — “as compared with other years . . .” — are examples of this.

Deceivers also have an agenda: to convince their target to believe their preferred narrative. They are unlikely to draw distinctions between other narratives and will tend to focus on their preferred one.

5. Words That Negate a Statement

Research also indicates that liars often employ more negative terms than truth tellers. This is why we drew the distinction between words indicating friendship and words that are friendly.

But researchers do not always find that the deceivers are more negative than the truthful. Our analysis of dishonest firm communications suggests, however, that they tend to use such words as “not,” “never,” “should not,” "does not,” and “must not” at a 50.4% greater proportion than the average.

Bonus

So what is by far the strongest indicator of deception? The number of swear words in an annual report. Though they are rarities, swear words occur in scandal company annual reports a whopping 277.1% more frequently than the mean.

If you liked this post, don’t forget to subscribe to the Enterprising Investor.

All posts are the opinion of the author. As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author's employer.

Image credit: ©Getty Images / Matthias Kulka

Professional Learning for CFA Institute Members

CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.

0.25 PL Record PL credit Manage your Professional Learning credits

Publisher Information

CFA Institute

13 Comments

I

Ivan (not verified)

12th March 2021 | 5:19am

Great job. thank

Reply

JA

Jason A. VOSS (not verified)

13th March 2021 | 8:30pm

Hello Ivan,

Thank you, much appreciated.

With smiles,

Jason

Reply

AS

Areeb Shujaat (not verified)

14th March 2021 | 12:03am

That's pretty interesting. Thanks for sharing the insights. It looks like you took the "ad" criticism on your last piece seriously. Appreciated!

It would be a lot more interesting if you could give examples of the swear words in the annual reports.

A question has absorbed me after reading this. Will the behavior of writers vary when English is their second language, such as several Asian countries where English is the language of offices but not that of the land. Will they drop different clues for deception detection?

Reply

JV

Jason Voss, CFA (not verified)

15th March 2021 | 12:38pm

Hello Areeb,

Thanks for taking the time to comment and to share your thoughts. In answer to some of your questions...

Work done on the way deceivers use language has been done in multiple other cultures, and in different languages. Results of this work indicate that liars tend to behave very similarly across the globe. That said, my colleagues at Orbit Financial Technology and I are currently in the midst of verifying some of these assumptions using Mandarin.

As for examples of swear words...to my knowledge, I am only aware of a handful of examples. Because of the rarity of the occurrence of these words, it is hard to prescribe which ones to look for specifically. More important is that swear words are indicative of a kind of attitude on the part of management that is to be avoided.

With smiles,

Jason

Reply

A

Areeb (not verified)

15th March 2021 | 9:36pm

Hi Jason,

Thank you for responding. This kind of research surely deserves to be part of the CFA Program curriculum, or say the CFA Program curriculum deserves to be enriched with this type of research. However, I will be sad to see that special competitive edge lost when the knowledge is made available to a larger domain. So this needs to be ongoing as an egg and chicken cycle. Looks like you have a lot of evolutionary work ahead :)

All the best,
Areeb

Reply

K

Kaon (not verified)

14th March 2021 | 11:00am

This is spot on and I'd love to see this integrated into the CFA Program as part of the level II curriculum. Learning ratios and other financial statement analysis techniques is important, but so is mitigating substantial losses due to fraud.

Reply

JV

Jason Voss, CFA (not verified)

15th March 2021 | 12:40pm

Hello Kaon,

That kind of work added to the CFA curriculum would be remarkable and demonstrate a significant advance in the exam authors' thinking. As disclosed in this article and the first one in this series, 86.5% of information in an annual report is text-based, but the CFA program has (to my knowledge) no techniques for assessing textual information.

With smiles,

Jason

Reply

PS

Phillip Soares (not verified)

15th March 2021 | 9:55am

Great work! Congrats on the results! I'd love to apply these concepts here in Brazil.

Reply

JV

Jason Voss, CFA (not verified)

15th March 2021 | 12:41pm

Hello Phillip,

Feel free to reach out to me via at the website: www.deceptionandtruthanalysis.com and we can set up a time to talk about your use cases and needs.

And thank you for your kind words.

With smiles,

Jason

Reply

DD

Dinesh da Costa (not verified)

16th March 2021 | 6:08am

We perceive the world through language, it's only right that signs of fraud show up in language first.

Thanks, Jason, Insightful as always

Reply