Thinking with Data by Max Shron
My rating: 4 of 5 stars
Thinking with data focuses, not on how to do data analysis, but on the questions that one should be asking. It does so in two ways, first through providing an overall framework to looking at situations, then working through a series of topics using examples to serve as plausible paths of decision making. In a fairly short book, it covers the framework, determining purpose, threats to validity, experimental design, and a few extended examples that illustrates both concepts and deviations. It is a useful quick big picture book that is useful for those whose focus has been on the methods of data analysis or for those who do not have a quantitative background but are faced with data questions and need to be able to work with data analysts.
The first part is probably the most rewarding. Max gives a framework of how to frame a data problem. Context (who is interested in the problem, what are their overall goals and why, what is the goal of the project), Need (the specific need that could be solved through the use of the data model), Vision (an understanding of what the results of data analysis would be like), and Outcome (an understanding of how the data analysis results would be used). The end of this framework would be a story that you can tell
Next is a discussion of how the details of the problem could be fleshed out. The content is probably familiar to anyone who has had to work with stakeholders. The valuable portion here are the vignettes of working through this process on projects. In particular the fact that the vignettes are not projects that necessarily go smoothly, so it does not have the idealized feel that many published vignettes do.
Next is a discussion of presenting the results. The focus here is that the results are not the output of the data analysis, but the use of the data analytics methods to construct and argument. And that argument is going to be presented to people who have backgrounds, prior beliefs, prejudices, and sometimes reasons to argue against your findings.
How to address these disputes is through conducting experiments and testing alternative hypothesis. So a section of the book is on defining causality and designing experiments (interventions) to handle different types of alternative hypotheses.
What makes this useful is the framework and the vignettes. It is good for a quick introduction to this area. As others have noted, it is not tightly organized, so after the first chapter with the framework, it is not useful as a reference, but it helps in focusing how to think.
I teach classes on working with data, and one area that is difficult to get across is the concept that there is a unified whole in the topic, not only a bunch of separated techniques. I plan on using much of what is in this book to help provide that unified whole my classes.
Disclaimer: I received a free electronic version of this book as part of the OReilly Bloggers program.
View all my reviews