Thursday, December 31, 2020

Book review: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens DemocracyWeapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil
My rating: 5 of 5 stars

There is a popular diagram that describes data science as a combination of math and statistics, computer programming skills, and subject domain expertise; and describes the dangers of what happens when one of those three are not available. But among academics, there is an opposed line of thought that says that math and statistics methods are pure and subject independent. This book is firmly against the idea that algorithms are a defense against bias. The reason may be that while the mathematician/machine learning modeler may be naive, the setting of the implementation is not, and the questions that are being asked as well as the data being used to train models are both affirmative choices where the analyst and customer have agency. And pretending to be a naive analyst leads to errors in the result that have real consequences.

O'Neil goes through a number of cases. But while many accounts will go into the "evils" of big data and machine learning, she does suggest good practices that can prevent the dangers. First evaluation of the model. The model should be tested by actually looking at its predictions and seeing if they are true. In statistics this is done through control groups. In data science this is done through holdout test sets. And in her case studies, she points out this is not done. Next, compare the model input data set to the population that will be applied. Again, she regularly points out where this is not done. A third one is the well known make sure that you are not using machine learning to perpetuate an undesirable status quo. (but this argument is too easy)

Read this at a book club at work. It spurred a great discussion, that has carried over into other conversations. Definitely recommended to those involved in data science/analytics where people are impacted, and in areas where data analysis is becoming a bigger part of life, so as someone who works with the results of analysis you can ask good question, both while the analysis is planned and performed, and in understanding and questioning the results.

View all my reviews