Often financial analysts are presented with statistical charts that purport to demonstrate an important — and, of course, investable — relationship between data points. These charts are supposed to be worth a thousand words and thousands of shares traded. But invariably these charts do not have an r-squared for the data displayed, or any other descriptive statistical data; just the seductive image. What is needed to help the (often) beleaguered analyst is an r-squared taxonomy, or catalog.
Take-aways:
- A better sense of what different r-squareds actually look like.
- How radically different looking charts generate similar r-squareds.
- Why it is crucial to use multiple tools, including charts, when analyzing data.
A Better Sense of What Different R-Squareds Actually Look Like
As an introduction take a look at the following chart:
While the graph shows a hypothetical performance for a hypothetical stock index and for a hypothetical sovereign 10-year Treasury note, I think you will agree with me that it is typical of a finance industry chart.
Take a look at how the data are only roughly related to one another between January 2008 to April 2010, and then they appear to track each other very closely. I could continue using flowery language and a successful analyst pedigree to try and convince you to trade with my firm. Sound familiar?
Would it surprise you to learn that the r-squared for the above chart is a lowly 2.18%?! To better educate you as to what different r-squareds look like here is an r-squared taxonomy compiled using a random chart generator, based in real-world data, and after thousands of trials. [Keep at it, too, there is more analysis at the bottom of the post.]
R-Squared = 0.00%
How can it be that this chart has an r-squared of 0.00% when between July 2009 and January 2011 it looks as if there is so much similarity? Remember that r-squared is a summary measure and that it is calculated as 1 − (sum of squared errors ÷ sum of squares total). Consequently, data can cancel one another out and affect the calculation positively or negatively.
R-Squared = 10.00%
R-Squared = 20.01%
R-Squared = 29.97%
R-Squared = 40.01%
R-Squared = 49.99%
R-Squared = 60.00%
R-Squared = 70.04%
R-Squared = 80.03%
R-Squared = 90.00%
How Radically Different Looking Charts Generate Similar R-Squareds
Most of the time when financial analysts think of r-squared they think of similarity, rather than relatedness or causality. The preceding charts show that the higher the r-squared the more closely the lines tend to track one another. But this is very dangerous thinking! In the thousands of trials done in order to create this post the highest r-squared chart randomly generated was a whopping 93.37%. But take a look at its chart below.
R-Squared = 93.37%
I bet you are surprised by the above result because, as I said, analysts tend to think of r-squared as similarity. However, the above chart demonstrates very high negative correlation of −95.29%. If you look at the chart above you will see a vintage of chart that recurred throughout the r-squared random trials: a scissors pattern. Count me among the educated by this experiment as I have never looked for scissors patterns when sifting through charts for causal relationships.
Take a look at various manifestations of scissors brethren.
R-Squared = 50.07%
Interestingly, look at the difference between the 50.07% and the 49.99% chart from before. While separated by only 0.08%, the two charts could hardly look more different.
R-Squared = 69.99%
Again, compare the 69.99% and the 70.04% r-squared charts, separated by just 0.05%. Last, compare the 90.81% r-squared graph below with the 90.00% chart above. What a dramatic difference.
R-Squared = 90.81%
Like everything in finance, reading charts is more complicated than just memorizing several heuristics, like “be on the look out for the scissors pattern.” For example, look at these very different looking, but similar r-squareds that do not adhere to the scissors pattern.
R-Squared = 50.30%
To me, the above chart “looks like” it would have a lower r-squared than the preceding 49.99% r-squared chart; yet, it is higher! Or what about the 92.31% r-squared below which looks to have a lower r-squared than the 90.0% chart:
R-Squared = 92.31%
For another interesting comparison look at the original 2.18% chart and compare it to the 10.0% r-squared chart. To further demonstrate how exactly the same r-squareds can look radically different compare these three very different ways of generating a theoretical 100.00% r-squared.
R-Squared = 100.00%, Identically Similar Movement
Here both data series move identically with one another; so much so, in fact, that you cannot distinguish the movement of the hypothetical stock market from the movement of the 10-Year Treasury Note Yield. [Note: For the skeptics, the presence of the left-hand scale indicates that the stock market close time series is present, just “underneath” the 10-year Treasury Note Yield series]
R-Squared = 100.00%, Scissors Movement (i.e., Negative Correlation)
R-Squared = 100.00%, Negative Correlation
Why It Is Crucial to Use Multiple Tools, Including Charts, When Analyzing Data
Hopefully I have demonstrated to you the futility of trusting your eyes when looking at chart data — seeing is not believing! It behooves analysts to study the r-squared taxonomy to ensure developing a feeling for what actual relationships of particular degrees look like. Chartists should broaden their scope to include data that demonstrate a scissors pattern/negative correlation and not just charts that track one another like dancers on a dance floor. Going forward it is obvious that understanding data well requires a combination of visuals and statistical measures.
Please note that the content of this site should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute.
28 Comments
Numbers and data do not lie. But they can be misinterpreted or manipulated to "fit the model".
Hello capitalistic,
Thank you for your above comment - yes, they can be (and often are) manipulated. I will point out that there are also several other shortcomings you didn't highlight:
* Choosing what method and what numbers to look at can lead to a 'truth' that is meaningless.
* Not everything worth evaluating can be measured by numbers.
With smiles!
Jason
Hello,
First, context: the post is not about r-squared, it is about the questionable wisdom of relying on charts with no other statistical data provided to make decisions.
So, with that in mind:
I simply used Excel's built in rsq function on values, not log returns. But however I calculated it, the point would be do not rely upon visuals.
Thanks for your comment!
Jason
Hi Jason. Thanks for bringing this to your follower's attention - it's a massive irritation to mathematically enlightened people to realize just how prolific this phenomenon is of incorrect interpretation of pictures is, using statistically dubious rationale. Just another thought .... is that methods like R^2 or correlation only work if the observations fulfil the requisite statistical conditions - like independence etc. Time series do not. Log-returns typically will. That should be a first point of departure for any educational treaty on this issue. Forgetting that, and focussing on the multiplicity of patterns that are supported by a range of R^2 values is repeating the Platonic cave allegory, surely?
Hi Adam,
Thank you for your comments. It's my opinion that abuse of charts goes in both directions. Many analysts produce charts that purport to demonstrate a compelling picture, and with very little other data used to support some sort of trade recommendation or intellectual point. Yet, on the other side, analysts being asked to evaluate a trade or point frequently do not insist on more information beyond the image and the opinions of the analyst. A poor state of affairs.
Thank you also for reminding folks that r-squared is only useful if the conditions for normality are present.
With smiles!
Jason
Hi Jason. You don't have to post this if you don't want. I believe you missed my point. Entirely. Your illustrations of different R-squared's fitted to different pictures is entirely spurious, and misses the primary mechanism for such random noise. You would not put diesel in a petrol car, and expect to take it for a ride. Similarly, you should not use R^2 or correlation for price series. You can, but you commit the same error that you are trying to illuminate to your readers, hence falling into the same trap as everyone else. Read the papers that your reader Emlyn suggests above - I fear you don't understand the statistical issues here. Kind regards, Adam.
Hi Adam,
Uh, well, maybe. I'm not sure I've fallen into any trap here. The point of my post was to create charts. It could have been frog leaps in the spring vs. the autumn. All of the charts in the above post are completely random, just to create the appearance of two data series graphed relative to one another; then an r-squared is calculated. They happen to be labeled 10-year Treasury and Stock Market Close - but those are inventions.
The point of the post is not the superiority of r-squared or even how to apply it. The point of the post is that charts deceive and folks who use them should do a deeper dive into the data. Hence, the subtitle, "Seeing is not believing." Most often when charts are posted in the investment biz there is no accompanying information - just some analyst's (usually sell-side) opinion as to the next appropriate action based on the "appearance" of causality. I used r-squared in the above post only to demonstrate that what "looks like" something of significance is a fiction.
With smiles,
Jason
Hi Jason. Thanks for your patience with me.
Consider the following arguments. The interpretation of two price series will often lead to spurious statistical conclusions. The reason these conclusions are spurious is because price series are auto-correlated, each observation within the same series being dependent on the observation before it. It is impossible to infer the strength of correlation or dependency *visually* between accumulating series for this very reason. You are 100% correct there – and kudos for point this out in a public forum.
So, you turn to statistics to prove the same point.
Using any statistical metric, say R-squared, to highlight the nature of this error only works if the assumptions of the metric are valid. Price series are auto-correlated (I think I’ve mentioned that somewhere). R^2 works on independent observations. An R^2 applied to accumulating series is equally spurious. Had you taken the log-returns of the series – in each and every example above, the R^2 would have been around zero all of the time. Had you done this, your point would have hit home beautifully. Rather, you apply R^2 to the series themselves, therefore committing the same error – not visually, but now inferentially using an assumed safer approach. But it’s not safer. It’s equally wrong, arguably more so since you are leaning on statistical architecture that commands more credos that subjective visual interpretation.
I think this dialog of yours highlights just how deep this issue goes, and how treacherous. CFA charter holders should not be making the same mistake that would fail freshman in school. But they do. Even those who are aware that something is amiss miss the mechanism. I suspect that even following this posting, most readers won’t get it. And that’s not pure arrogance – its empirical fact - that just what we see in the professional capital markets the world over.
Yours
Adam
Hi Adam,
There was no need to take the log returns of these series as they are not time series data. These are random inventions. And I am very familiar with the problem of auto-correlation. It seems that you are reading too much into the labels I gave the 'data.'
Thanks Adam,
Jason
Thanks Adam and Jason - Jason appreciate your raising the issue and being transparent, Adam - thanks for the solid back to basics advice