Sunday, May 11, 2014

Data Mining with R by Luis Togo: Book review

Data Mining With R: Learning By Case StudiesData Mining With R: Learning By Case Studies by Luís Torgo
My rating: 3 of 5 stars

Data Mining With R (DMwR) promotes itself as a book hat introduces readers to R as a tool for data mining. It teaches this through a set of five case studies, where each starts with data munging/manipulation, then introduces several data mining methods to apply to the problem, and a section on model evaluation and selection. It fills a place in the literature since it devotes a lot of space for data manipulation before applying the various methods and model evaluation afterwards. But it is hard for people learning data mining since it spreads the types of model throughout the book.

I used this as one of two texts to teach data science to people whose programming and data analysis skills were generally at a very low level. The big advantage of using a programming environment such as R for data mining is the fact that you can do data manipulation in the language, then apply the methods. Many of my students have taken machine learning elsewhere, but they always used prepared data sets, so this emphasis on data manipulation with several very disparate data sets is a unique feature.

The second big advantage of this book is the focus on model selection. For each chapter, the book goes through the exercise of determining which model should be used, and how to diagnose the model to determine which one is appropriate and best for the problem. I especially appreciate the fact that in some cases, the conclusion of the book after model evaluation is that the method did not work for the problem and question at hand. Because most textbooks focus on demonstrating that you did find something, in some cases my students get confused when in real problems they did not find an effect.

Where the book is lacking is the fact that the methods are scattered across the case studies with minimal organization. While this is a result of the realities of the cases, the book would have benefited from a roadmap chapter or introduction that gave methodological context (i.e. what methodologies are used in the book and where they are). This lack made it very difficult to use as a textbook, and by the time I was done using it I was essentially building the roadmap to use the book. This makes it not useful as a standalone textbook for such a course, but very good if there is another text that gives the overview of the methodologies.

View all my reviews

No comments: