Saturday, August 29, 2009

The lady tasting tea

Statistics is one of the cornerstones of modern science, the other three being Blood, Sweat and Tears.

First, it is important to distinguish between the different things that are called statistics. There is mathematical statistics, or so I have heard, and there is applied statistics, which is what we are generally talking about. In applied statistics there is descriptive statistics, which we call plotting, and finally inferential statistics.

Inferential statistics is the area that most (junior) scientists mean when they say that they hate statistics. It is the area of hypothesis testing or finding of significant differences, and what people hate about it is that it is never clear which test you should use.

You will have learnt that you should predefine the tests you are looking for and design the experiment after that. The problem lies in having a too theoretical understanding of statistics and not having enough experience with what ever kind of data you possess. This means that you have not been able to predict what kind of analysis gives the most power with the data you end up with.

In the end, the first time you handle any type of dataset has to be considered an exploratory investigation. And, it is important to accept that and really get to know how the data responds to analysis and what kind of results you can get out of it. Then, next time, you can predetermine the correct type of analysis, and feel like you are on higher ground, theoretically.

So, how do you get to understand statistics, or at least start to understand it?

I have found no statistics text books that are worth while in the beginning. None. Many are useful once you have understood something.

The same goes for statistics courses. They are just not useful, especially not for understanding statistics. The most useful courses I have had in statistics were those that focused on a specific problem and a specific approach, i.e. "Using R and Bioconductor for the analysis of mRNA expression micro arrays". Even with those, the real help with understanding comes from working with your own data.

I suggest two things:

  1. Read the wonderful book "The lady tasting tea" by David Salsburg describes the development of statistics as a science in a most delightful way. He manages to present many of the basic ideas of statistics without being technical. I can only recommend it, especially if you like the history of science.
  2. Start using a statistics program that can help you understand what you do. I suggest R. It's free, it's what all the statisticians use (so you can ask them how to do something), it includes a manual, and it includes original references to many of the methods.
Yours,

Michael Hultström

No comments:

Post a Comment