First, the updates.
If you’re an avid follower or even a casual reader, you’ve probably noticed the formatting of this blog seems inconsistent, even amateur at times. At least, that’s how I see it. Maybe it’s me, but Blogger just doesn’t seem to correctly handle anything I want it to do. I’ve recently heard, too, that some folks have had trouble publishing comments. I don’t really know what’s going on; Blogger just sucks, I guess. I think I’m going to switch to WordPress in the future. When that day comes, I’ll let everyone know.
Right now, I’m writing this from Windows Live Writer. Hopefully, it does a better job than blogger’s editor. We’ll see when this is posted.
Ok, now a book update.
My publisher told me I’ve sold 32 copies of my book on preorder. Holy Crap! Thanks guys. Some readers have told me Chapters 1 and 2 are already available if you preorder through the alpha program. I’ve received some really good feedback about them. Thanks to the folks who’ve read them and reached out to me. I’ll be adding a link to the book at the top of the blog. Unfortunately, because blogger sucks, the mere addition of another link will mess up my layout. Then I’ll have to spend a few hours redoing everything. Therefore, it may a while after I poste this before the link appears. Just hold tight.
Is Excel to Blame?
If you’ve been following the news recently, two major events have brought focus on the use of Excel for modeling and research. The first is the London Whale incident, in which JPMorgan employed a spreadsheet based model for their default credit swaps. According to JPMorgan’s internal report, “Spreadsheet-based calculations were conducted with insufficient controls and frequent formula and code changes were made.” Specifically,
…further errors were discovered in the Basel II.5 model, including, most significantly, an operational error in the calculation of the relative changes in hazard rates and correlation estimates. Specifically, after subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR, 129 although it is unclear by exactly what amount, particularly given that it is unclear whether this error was present in the VaR calculation for every instrument, and that it would have been offset to some extent by correlation changes. It also remains unclear when this error was introduced in the calculation.
The other event is this this critique of a paper called Growth in a Time of Debt by Harvard economists, Reinhart and Rogoff. (I’m not an economist by the way, so if I make mistake in what follows, feel free to let me know.) From what I gather, Reinhart and Rogoff show that countries with incredibly high public debt (in excess of 90% of gross domestic product, I think), suffer from hampered economic growth. In the critique, authors Herndon et al accuse Reinhart and Rogoff of selective exclusion and coding errors. But what’s really become headlines is the fact that Reinhart and Rogoff used Excel for their analysis.
The argument is that Excel just isn’t a good program to use for data analysis. My response to this has been, “It’s a poor craftsmen who blames his [or her] tools.” To be fair, I actually stole this response from Jon Peltier. I really think it concisely describes how I feel. So kudos to Jon. I used the quote in a response on Andrew Gelman’s blog. I excerpt part of my response below because I think it’s a good lead-in to my overall point.
I remember in my elementary matrix algebra class some years ago, we had to perform Guassian elimination by hand. After the first test, it became clear to me I wasn’t paying close enough attention to my operations. Sometimes, if I wrote down a number too quickly, my hand would “skip” on the sheet of paper and make it appear as if I wrote a negative number. Imagine if I told my professor, “it’s not my fault, my paper isn’t rough enough to provide the adequate friction to perform these operations.” Or: “My pencil breaks off too much graphite, which makes my mistakes too hard to detect.” Or even: “My calculator’s buttons are too close. I can’t help fat fingering operations.” Even if could show that the pen, paper, and calculator actually contributed to my making mistakes, do you think my professor would (or should) care? Let me put it to you this way: What IF Reinhart and Rogoff had used only pen and paper? Should we blame BIC for making a pen unsuitable for analysis? Should we blame Mead for making unsuitable paper? I don’t think so. Reinhart and Rogoff’s analysis was faulty because they were inattentive. That’s not Excel’s fault.
JPMorgan, according to their internal review, knew the potential for error in the Excel model that ultimately became the London Whale scandal. Managers had even recommended that the model be automated and audited, but never followed through. Why not? Why wasn’t the model thoroughly tested when it provided results that appeared counterintuitive? Because they liked the model’s results and its potential for big returns.
Reinhart and Rogoff—what makes them so different? Are we expected to believe that Excel is responsible for their not rigorously unpacking and verifying their data and model? To me, a more reasonable answer is they really liked their results; and their reviewers really liked their results; and economists looking for more ammunition against government spending loved their results. Their results meant big impact.
That’s my thesis: there’s too much pressure to make an impression, to be different, to be sensational. We’re less likely to be skeptical of work that plays on our own confirmation biases, especially if it’s going to be huge. That same sensationalism, by the way, is behind the blame-excel bandwagon. Here are some headlines: “Quote of the day: Excel error destroys the world” from Mother Jones; “Microsoft Excel: the ruiner of Global Economies?” from arstechnica. To me, blaming Excel feels like more causation creep. Sure, Excel was used both by JPMorgan and Reinhart and Rogoff—but so what?
The real problem
The real problem concerns how we, as a society, treat data and research. For some reason, data is given a privilege status in our society. We aren’t as skeptical of it as we should be. We believe first and question later; that is, only if things go wrong—and, at some point, they will. When our news presents a study to us, we tend to interpret the results as being hard facts and not the output of a model that is potentially as flawed as the Humans who devised it. We look to technology to solve our problems, and when technology “fails” to meet our expectations, we toss it out as if it were a consumable good, like an old cell phone. But this view of technology, as the great solver—or destroyer—of the world’s complex problems is foolhardy at best and catastrophic at worse. It removes responsibility from our institutions and companies to make something better with time, instead of simply making it newer. It removes the responsibility from us to apply rigor to our own work.
Sensationalism drives the backlash against Excel. And don’t get me wrong, there are many things to criticize Excel for. However, that someone did not provide the necessary rigor to their own work is not Excel’s fault. In fact, while everyone is arguing over whether Excel provides the necessary information assurance, they’ve missed a recent addition to Excel’s capabilities that should have everyone concerned.
Specifically, Microsoft’s recently unveiled GeoFlow add-in.
From a technology standpoint, the fluid-like movement of anything in Excel is likely to dazzle; it certainly feels impressive. But before you make a flyover tour of your data, consider that this type of visualization has lots of problems.
According to the write up in the Office Blog, GeoFlow should allow you to:
- Map Data: Plot more than one million rows of data from an Excel workbook, including the Excel Data Model or PowerPivot, in 3D on Bing maps. Choose from columns, heat maps, and bubble visualizations.
- Discover Insights: Discover new insights by seeing your data in geographic space and seeing time-stamped data change over time. Annotate or compare data in a few clicks.
Discover new insights? No, I don’t think so. We’ve been able to place data on maps for as long as I can remember. And mapping how data changes overtime can probably be better presented using a timeseries plot. What Microsoft isn’t telling you is that plotting a column chart in a 3-d environment makes values hard to discern and occludes data from view. Virtual environments are useful when the physical environment is critical to understand the data within (like, say, in a flight simulation training program). But the data presented above is only hurt by us the use of three dimensions. Truly, what do we gain by adding a horizon which only limits data in the distance is it would in real life? Why should we choose to see less and not more?
Neowin’s blog on GeoFlow best sums up Microsoft’s motivation:
Let's face it; Microsoft's Excel spreadsheet program is certainly useful but it's also kind of dull to look at a flat 2D row of numbers. Isn't there some way that someone could turn these dull strings of numbers into something more exciting? Microsoft thinks the answer to that question can be found in a newly launched Excel add-on, GeoFlow. [emphasis added]
Let’s be concerned about this. Let’s not argue over which technology best allows us to forgo a certain level of rigor in case we make a mistake in our analysis. Instead, let’s focus our concern on technologies that research shows cannot deliver on what they promise. In my book, I argue that vendors are trying to remove the decision making process from us. They want us to trust their gimmicky programs and tell us that we humans are too error prone to make decisions on our own. We need data visualization and dashboards to make the decisions for us, they argue. Things have just gotten too big, and too complex.
It’s true we Humans are error prone. But we can analyze our actions and intentions, and start anew. If our Excel analysis is wrong, we need the ability to unpack and analyze what’s going on. When something pretends to do the analysis for us, to show us new insights like never before, let’s be skeptical. What happens when we rely on a technology that does not allow us to apply rigor? what will happen when it doesn’t deliver? should we stop using it and throw it away? I can see why people view that as the only option.
We should be happy that Excel doesn’t do the work for us. Because when we make mistakes, and when we learn form our mistakes, that’s progress. That’s making the world better, not just newer.