Open source software doesn’t necessarily mean we’ll have better stats
Let me tell you a little story about data analysis and peer review, that’s giving me a little pause. Last year, I analyzed a dataset from a project that my lab did with a couple students. I got a cool result. Since then, we’ve taken the work in a new fruitful direction, while the paper is in the queue*.
The most interesting findings were part of an analysis that I conducted in EstimateS, with some subsequent tests in JMP. If my understanding is correct, this is an opaque way of conducting the analyses that advocates of “open science” frown upon, because they can’t see all of the code that was used to conduct the analysis. (EstimateS tells you what and how it calculates values, but the raw code isn’t available, I don’t think? And JMP doesn’t have open code. Also, if I’m painting “open science” with strokes that are too broad, please let me know, as I do recognize that “open science” is not one thing.) In this study, the responses to our treatments were unambiguously different, in an unexpected and robust manner.
My result was exciting. I shared it with my co-author, who asked for the data files so he could verify it in R. He didn’t. He came back to me with some results that seemed just as correct as the results that I got, but not nearly as interesting, and not showing any real differences among treatments. But it looked just as credible, and perhaps moreso because it wasn’t exciting.
That was a head-scratcher, because, after a quick look through my numbers, it looked like I had gotten it right and there was no reason to think that I didn’t. And quite reasonably, my collaborator thought the same thing.
So what did I do? Well, to be honest, I just put it on very low heat on the backburner. Didn’t even think or look at it. I had a bunch of grants, other manuscripts, and teaching to focus on. There were other proejcts, just as exciting, that were awaiting my attention. I thought I’d get to it someday.
Fast forward maybe half a year. My coauthor contacted me and said something like, “I went through that analysis against to see why our results were different, and I found the smallest error in my code. Now my results match yours.”
So, problem solved. One more year, and this manuscript is being shifted to the frontburner, now that I’m on sabbatical. When we write up the results, it won’t matter if they came from JMP or an R package, because we got the exact same thing. I guess I’ll say I used both, to cover my bases. Because reviewers have their biases. I imagine – or at least I hope – that the software I used to conduct the analyses doesn’t hinder me in review.
As I’m shifting stats to R over this next year, here’s a worry: Is shifting to an “open” platform actually going to result in better statistics? Isn’t it quite possible that we are going to see bigger statistical problems than we have in the past as more people who aren’t programmers are writing their own code? While you might want to say, “That’s why all scientists should be programmers,” if you chat with a variety of early-career scientists, there are many who are not professional coders and don’t have this as a high priority. Yes we should be more literate in this respect, but if we have expectations that don’t reflect reality then we might be fooling ourselves.
Consider my experience. Let’s say I relied on my collaborator who was using tried-and-trusted-and-peer-reviewed R packages. When the results came in, I’d write them up, and our paper would be out, and that would be it. And it would have been entirely wrong. Most journals don’t ask you to upload your code or to share it during the peer review process. But let’s say we did, what are the odds that this teensy coding error would have been caught during review? It’s quite possible that even robust reviewers who know what they’re doing could miss a slight syntax mistake that resulted in incorrect but credible-looking results. We are not positioned to require peer reviewers to scrutinize every single line of code on github associated with every manuscript in peer review.
I think the most forceful argument for “open software” is that transparency is required for quality peer review and the ability to replicate findings. If you can’t see exactly how the analysis was conducted, then you can’t be confident in it, the argument goes. In theory, if I only provided data and results, that would be a problem if I don’t share the code from the software.
That’s a robust argument, but based on my own experience, I don’t know if that scenario gives rise to a bigger problem. Sure, when we run analyses through (say) JMP or SPSS, it’s a black box. But it’s a mighty robust black box. While some folks have identified problems with statistical formulas in a popular spreadsheet (Excel) that were fixed a long time ago, I’m not aware that statistical results from widely-used statistical software contain any errors. If you can’t trust whether SAS has an error when running a regression because you can’t see the code, then you’ve got trust issues that transcend practicality. People can fail to understand what they’re doing or not correctly interpret what they get, of course. But as far as I know, the math is correct.
So, what and whom do you trust more to get the complex math of statistical models right? The individual scientist and their peer reviewers, some or all of whom may not be professional statisticians or programmers, or the folks who are employed by SAS or SPSS or whomever who are selling (way overpriced) statistical software for a living?
What do I prefer? Well, that doesn’t matter, now does it? There’s enough people who are happy to crap on me for not using R that I pretty much have do the switch. But this also means that I actually will have less confidence in my results, because I will have no fricking clue if I made a small error in my code that gives me the wrong thing. And I can’t really rely on my collaborators and peer reviewers to catch everything, either.
It’s a good thing that, while coding for R, you are essentially forced to comprehend the statistics that you’re doing. The lack of point-and-click functionality means that you aren’t just going to order up some abritrary stuff and slap it into a manuscript without knowing what happened. This makes it a great teaching tool. What’s worse? Using menu-driven black box software and potentially reporting the wrong thing, or making uncaught coding errors and potentially reporting the wrong thing? I don’t know. I have no idea which problem is bigger. Speaking for my own work, I’m more concerned about the latter. For me, using R won’t up my statistical game one bit. It’ll just put me in line with emerging practices and allow me to conduct certain kinds of analyses that aren’t readily available to me otherwise.
If you disagree with this opinion, I’d be hoping for some data, so we can avoid Battling Anecdote Syndrome.
*My lab has a bottleneck when it comes to getting these papers out. This project was done in collaboration with two rockstar undergrads from my university, and I was hoping that they could take the lead on the project. One promptly moved on to a PhD program, and in consultation with their PI, they don’t have the opportunity to take the lead. The other is not in a position to take the lead on this paper either (it’s complicated).