If You Know Statistics, the Truth Will Follow.

Introduction

Marketers, journalists, and a great many people in general throw statistics around like eggs on Halloween.  Those commercials on how coal is “the future” and how it is clean come to mind.  In fact, the conservatively dressed woman hired to convince the public – sound bite after witless sound bite – by stating that “most Americans” agree that coal is good for the environment uses an empty and unsubstantiated statistic in her harangue.   The coal lobby then throws an unlabeled pie chart in the viewer’s face that features magnified, smiling, “average-looking” people standing in the largest slice to drive home the point that you should support them because a majority of Americans believe in the alleged benefits of coal as an energy.

Statistics serve no other purpose than to draw inferences about the truth – a truth that depends both on the statistician’s ability to study a population and the size of the population itself.  When parameters (values drawn from studies of a population) are much too difficult to attain, statisticians redirect their efforts to derive a statistic instead.  She is left with the option of taking a “good” sample and drawing inferences on the population it represents.   Tips on how not to get bamboozled by vague statistics we find in the media and at work is the focus of this week’s article.

What the… I Thought This Was a Website on Sports Business?

It is, though success in all types of business depends on a thorough understanding of statistics and what they mean.  If you are not familiar with the fine art of inference that is Statistics, it would certainly help to have someone on your team who does.  Imagine for a second that you manage a tennis league and are looking for the right shoe sponsor.  The world’s largest shoe company tells you “hey, we have a great shoe that boasts a median life expectancy of 80 matches according to our latest tests.  We’ll sell you each pair at cost if you put our logo on all of your promotional items.”  Ah, fantastic!  The largest shoe company in the world wants to sponsor your league!  Life just couldn’t get any better, could it?

It’s not that simple.  That’s when you ask how they arrived at the 80-match number.  “Well, we tallied the number of matches each tested pair lasted before the sole completely wore out and listed them in order from least to greatest.  We found the number in the middle and determined that we expect our shoes to last 80 matches,” says the shoe rep.  Though the figure is impressive, one must always ask for the range and size of the sample when dealing with medians.  After all, they could have tested only 11 pairs and had the following results (measured in matches): 40, 45, 45, 47, 73, 80, 81, 84, 84, 85, 85.  Now, the median is not so impressive since we have discovered in this hypothetical example that the measurements are skewed.  Conveniently, you were told just a median and had you not asked for at least the range of measurements (40-85) and the sample size (11), you would not have known that the statistic you were given was terribly inaccurate but served the shoe rep’s purpose!

Averages are another story altogether.  Let’s say that you work for your favorite local roller derby team in their marketing department and you are averaging the amount of revenue the team generated in merchandise sales in the 2009 season.  You calculate the mean (another word for average) and determine that the team raked in an average of $1950 per bout in merchandise sales.  Your numbers could be thrown completely off if you included outliers (quantities way out of the norm… there is a way to determine which figures, if any, are outliers in your sample space, by the way) that brought your numbers way up or way down.  In this example, your team actually generated $1500 per bout, but due to the fact that you included the one day that your staff sold $2900 in your calculation of the mean, your average now looks more like $1950 per game.  This leads to faulty measurements of your team’s performance and may cause you to form an inaccurate assessment of your sales initiatives (among other possible errors.)

Now, after 10 years in roller derby you are hired by Hendrick Motorsports to make sure the garage always has enough spare parts, but not too many.  Your first duty is to choose and buy tires the team will use during practice and in every race next season.  The manufacturer claims that all but 2% or less of their tires will last an average of 50 laps.  Ah, but you are skeptical.  All those years of working long hours in obscurity have paid off, haven’t they?  Good job.  You tell the manufacturer that you would like more evidence.  The day after, their sales rep sends you information on the number of tires they tested, the conditions they were tested in, how much the measurements vary from one to the other (otherwise known as standard deviation), and they even tell you how they managed to get a “good” sample and the measurements are normally distributed (when graphed, they form a bell-shaped curve… that’s very good for your purposes, by the way).  When you run those numbers through the equation in the Central Limit Theorem, which is used to compare the mean of a sample to the mean of all the means of all the measurements drawn to answer a question, you discover that their “2% or less” guarantee was a gross exaggeration.  In fact, the probability that 7% or more will not last 50 laps is significantly larger than the probability that only 2% will wear out before their time.  The team thanks you and you remember why you chose this career over creating the layout for each year’s swimsuit issue.

The Point

Statistics are used to draw inferences when the truth is inaccessible.  The truth and related inferences are very powerful and, believe it or not, are often used incorrectly on many levels.  The decision-making process depends on one’s intuition and empirical data; exclusive use of one or the other can lead to undesired results in many cases, though.  When you are presented with a statistic, always ask (at the very least) who or what was measured, how many, why they were included in the study, who performed the study.  Often, we are fed very vague and poorly calculated and derived statistics that work only to further someone’s agenda.

Be very careful when basing an important decision on a statistic.  When dealing with a survey, one simple question you can ask is whether or not the sample was self-selected (respondents chose to participate on their own) since such participants tend to hold extreme or biased positions on a subject and will compromise the sample’s ability to represent the entire population (i.e. when polling your fans, you do not want only the opinions of the die-hards if you wish to understand your entire fan base altogether).  It is also good to ask how many people participated in the survey or poll (i.e. asking for the sample size.)  You could also ask if sampling was done with replacement (whoever was randomly chosen from a group to participate in a study is “replaced” in the pool and could be chosen again) as that could affect the variance of the answers you receive.  Lastly, you could ask if participants were part of a systematic random sample in which, for example, every 50th random passerby was chosen to participate in a survey.

So, the next time you hear someone make a claim based on a statistic/probability, ask as many questions as you can.  You could be reading a pie graph that tells you the respective proportions of general managers who who felt that certain specific issues must be addressed by the Chicago Cubs’ new ownership before they return to the World Series, but without knowing which teams or leagues those general managers work for and if they understand the sport and business of Major League Baseball, you probably should not attach too much importance to that graphic until you know more.  Finally, the next time you watch that commercial by the coal lobby or anyone else who misuses a statistic, ask yourself how many people out there would believe such claims without asking for more evidence.  It would not hurt for the source to be credible, either.

Basically, when dealing with a population much too large to study, we use a sample (a statistic) to draw inferences that will bring us closer to the truth.  When interpreting a statistic, critical thinking is key.

Cam Suarez-Bitar.

Thank you for your readership.  By no means was this article a comprehensive analysis of either the complexities of statistics or the decision-making process in sports business.  It was meant to introduce the reader to the important role statistics play in the decision-making process and how to avoid being tricked by either careless or unethical advertisements or claims.  Hopefully, this week’s article will make the study of statistics seem more relevant.

Advertisements
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: