Seeing all the differing facts out there about COVID-19 and the impacts these statistics are having on our daily lives, I thought an article about statistics in general, and how they can be misused or misunderstood, would be a great thing to talk about. Don’t take this to mean that I believe COVID statistics are being manipulated or misused, but it is important to realize that the same set of statistics can be used to tell two different stories. For example, stating a 0.4 percent death rate (four out of one thousand die), sounds worse than a 99.6 percent survival rate (996 out of a thousand survive).
Misleading statistics can be defined as the misuse, purposeful or not, of numerical data. This information can be used to cause an individual to make a decision based on the facts presented, rather than the truth. Let’s look at some of the oversights and misleading statistics examples from modern sources.
First off, statistics can be a very reliable way to present the truth about anything that can be tracked with numbers. Any mathematician will tell you that the numbers do not lie, but they can be used to tell half-truths. This is commonly known as a “misuse of statistics.” There is a false belief that this type of misuse is limited to reports intended to gain profit by distorting the truth. In truth, most of the manipulation of statistics happens when amateur mathematicians attempt to find the truth in the numbers.
A majority of the rest come from the manipulation of the input data. A 2009 investigative survey by Dr. Daniele Fanelli from The University of Edinburgh reported that 33.7 percent of scientists surveyed admitted to questionable research practices. These included modifying experimental results to improve outcomes, interpreting data in a way to support their results, withholding the details of the analytics, and dropping observations because of “gut” feelings. It is clear that though numbers don’t lie, scientists are people and people do lie, sometimes unintentionally. So, how can statistics be manipulated?
Flawed correlations, the most common way to use statistics in a misleading manner, is to state that correlation results in causation. A great example of that is the false statement that 5G networks spread COVID-19. You see a higher number of COVID-19 cases in areas with 5G network coverage. What is misleading is that there are also denser populations in areas that have 5G coverage, resulting in more cases because of higher rates of contact, which has nothing to do with 5G networks.
Faulty polling can have a big effect on the results of a study; my favorite example is the 2016 election presidential election. The polls showed a landslide victory for Hillary Clinton and the election results showed a win for President Donald Trump. This was the result of polling people in large cities and densely populated areas that tended to support Clinton and ignoring the even larger number of rural residents.
Data fishing, or data dredging, uses data mining techniques, where extremely large volumes of data are analyzed to discover relationships between data points. Seeking the relationship is not misuse of the data, but doing so with a preformed thought about the relationship is a self-serving technique, often employed for circumventing traditional mining techniques in order to seek correlations that do not exist. Papers published using these techniques are usually highly publicized due to outlandish findings and are also nearly always reversed by deeper research very shortly after release. This always leaves a confused public searching for answers regarding the reports.
There are three more ways statistics can be misleading that I will cover next week. Just remember to check facts for yourself before believing what you read. Until next week, stay safe and learn something new.