Correlations between stock prices of different companies


i tried to reproduce the affiliation of companies to industry sectors by just looking at the correlations of their share price curves and got the following graph (both pictures contain the same graph, but use different layout algorithms). i selected companies based on their market capitalization, but threw out companies that didn't exist before 2015 under their current stock symbol (because companies with a short track record tend to be highly correlated to everything and thus tend to be the centers of giant meaningless clusters) or for which i couldn't download price data. the correlation is not computed on the prices of companies X and Y directly, but after the following adjustment:

  • use 1 data point per day (the closing price)
  • keep only data points that are known for both companies (based on the timestamp of that data point)
  • then apply the natural logarithm to all data points
  • then chop up the data into subsequences of 250 consecutive trading days (around a trading year), with the starting day of two different subsequences being 125 trading days apart
  • then calculate the usual correlation for each pair of subsequences (i.e. always 1 subsequence from X and 1 from Y)
  • finally average these correlations.
based on these average correlations, i finally calculated a minimum spanning tree using 1.0 minus correlation as the distance. the graph passes an obvious sanity check: both "GOOG" & "GOOGL" and "AIR_PA" & "AIR_DEX" are connected - both are examples of the same company trading under 2 different stock symbols. the idea largely achieved its goal: most companies in the same industry usually form a tight cluster:
  • there is an "oil & gas"-cluster around "SU"
  • a "semiconductor"-cluster around "AMAT"
  • a "retail banking"-cluster around "ING"
  • a "railway"-cluster around "CSX"
  • a "consumer defensive"-cluster around "PEP"
  • a "healthcare & pharma"-cluster around "UNH"
  • an "energy/utilities"-cluster around "AEP"
  • an "IT"-cluster around "GOOG"
  • an "alcohol"-cluster around "BUD"
but not all companies cluster where you'd expect it. e.g. "TSLA" clusters with "healthcare & pharma". and the "groceries & home improvement"-cluster containing "WM", "DG", "DLTR", "TGT", "HD", "LOW" does not contain "KR" or "COST" (which are both in fact 8 steps away from the nearest cluster member). also, the "groceries & home improvement"-cluster is 7 steps away from the "consumer defensive"-cluster. also, there is no "consumer cyclical"-cluster at all - "consumer cyclical"-companies are scattered all over the graph.
new version with >3000 companies (careful, the linked hires-version of the image is HUGE):

Written by the author; Date 14.01.2026; © 2026 spinningsphinx.com

Paralinguistic/connotation key:
  • Mocking
  • Sarcasm, e.g. "Homeopathy fans are a really well-educated bunch"
  • Statement not to be taken literally, e.g. "There is a trillion reasons not to go there"
  • Non-serious/joking statement, e.g. "I'm a meat popsicle"
  • Personal opinion, e.g. "I think Alex Jones is an asshole"
  • Personal taste, e.g. "I like Star Trek"
  • If I remember correctly
  • Hypothesis/hypothetical speech, e.g. "Assuming homo oeconomicus, advertisement doesn't work"
  • Unsure, e.g. "The universe might be infinite"
  • 2 or more synonyms (i.e. not alternatives), e.g. "aubergine or eggplant"
  • 2 or more alternatives (i.e. not synonyms), e.g. "left or right"
  • A proper name, e.g. "Rome"
One always hopes that these wouldn't be necessary, but in the interest of avoiding ambiguity and aiding non-native English speakers, here they are. And to be clear: These are not guesses or suggestions, but rather definite statements made by the author. For example, if you think a certain expression would not usually be taken as a joke, but the author marks it as a joke, the expression shall be understood as a joke, i.e. the paralinguistic/connotation key takes precedence over the literal text. Any disagreement about the correct/incorrect usage of the expression may be ascribed to a lack of education and/or lack of tact on the part of the author if it pleases you.