Book Review: Data Analysis with Open Source Tools by Philipp Janert

This is a book that is how to think about data analysis, not only how to perform data analysis. Like a good data analysis, Janert's book is about insight and comprehension, not computation. And because of this it should be a part of any analysts bookshelf, set apart from all the books that merely teach tools and techniques.

The practice of data analysis can get a bad rap, especially by those who think that data analysis is only statistics. Most books on data analysis don’t help because they focus on using the features of a particular tool, leading to the view that data analysis is following a recipe from a cookbook. This book subverts this by being principally of how to think about data analysis, and providing examples using different tools (primarily R and Python, but he uses other examples as well)

Among other topics, Janert covers graphing, single and multi-variable analysis, probability, data modeling, statistics, simulation, component analysis, reporting, financial modeling and predictive analytics. In each section he starts by explaining the concepts, what it is for, and (just as important) what each topic is not. Working through it you get a sense of not just what and how of the various tools and methods discussed, but why they are used as well as some ways these techniques are misapplied.

Janert also illustrates the methods using some data analysis environments. Principally R and Python (with Numpy, Scipy and Matplotlib), but also other tools such as Gnuplot and the Gnu Scientific Library. What is helpful here is the focus is on what techniques and capabilities are needed in the tool, not the tool itself. Instead of being a cheerleader for a particular tool, Janert discusses in his appendix the qualities that make environments such as Matlab, R and Python good data analysis environments. However, this focus means that he does not teach any particular tool. If you want to learn how to use a particular tool for data analysis, you are better off getting a book on R or Python (or Matlab, Excel, etc.)

