Fernando Perez: "Literate computing" and computational reproducibi...: As "software eats the world" and we become awash in the flood of quantitative information denoted by the "Big Data" b...
I am teaching a course in logistics and supply chain. Because the course is focused on the modeling of supply chains, and the students are generally using Excel (I actually did all of the homework solutions in Python) I used the Reinhart and Rogoff’s “Growth in a Time of Debt” discussion as a mini-case (every class I find an article to discuss. Usually it focuses on business decisions, but this time I chose this topic.)
The focus on the class discussion was on how you look for errors. Because Excel makes this nearly impossible, the real question was on how you focus your time.
But the real answer, as Perez mentions, it to use tools that make reproducible research simple. The homework assignments I gave resulted in spreadsheets that covered 4 tabs and were a couple hundred columns and thousands of rows. My solution in python was about a page of commented code which had a near one to one correspondence with the mathematical formulation of the model (with the addition of a few lines to read and massage the data.) The spreadsheets were pretty much un-auditable. My code had comments that would tell a reviewer what to look for. Even for smaller problems, my students spreadsheets were fairly obtuse while my Python programs (I use Pweave) alternated between the explanation of each step with the calculations described.
The Thomas Herndon, Michael Ash, and Robert Pollin paper included R code that both demonstrated the effects of the errors of Reinhart and Rogoff.
For fun, Vincent Arel-Bundock has converted the Herndon, Ash, and Pollin into an iPython notebook, suitable for looking at (and reading) as well as downloading it and playing with the data to test any thoughts someone may have about testing the impact of various types of errors.