Drifting towards deadwood, or not: learning to use R (updated)
Update 15 May 2013: If you’re a newbie to R and want to know where to start, the comments on this post are now replete with (what I surmise to be) wonderful suggestions. Of course learning in the presence of those who know R is best, but this is a great set of suggested resources regardless of your environment.
–
I’m not that old, but I already feel myself getting a little stale.
How did this happen? Well, I guess it’s because I’m a professor and this is just the default rate of entropy.
When I was an undergrad, one of our introductory bio professors was a kindly man who was the archetype of deadwood. He had a separate slide carousel for every lecture in his course. When it was his time to teach, all he did to prepare was to pull the carousel off of the shelf. He didn’t have any idea what he was going to say until he saw the slide appear on the screen. Then, he would say the same thing he’d been saying for that slide over the past 20 years. It was just so obvious. One day, the slide projector broke. What did he do? He cancelled class.
This kind of thing is even more common now than it was back then, because few people had so many carousels at their disposal. It’s just done with powerpoint.
I’ve worked hard to keep my teaching from becoming stale. And since I’m doing a lot of research, then I can’t get stale at doing research either, right? If only that were true.
I imagine that molecular biologists all had to learn the ropes at PCR as machines and reagents became commercially available, and then relatively cheap and efficient. Nobody’s out there doing allozymes for population genetics after all, I would hope. And the same is true for RNAi, and now with nextgen sequencing approaches to genomics. In my flavor of work, there isn’t as much required to stay current, but nonetheless I’m still getting behind. If only I could have the time to run to just to stay in place.
At least I’ve diagnosed this condition and can fight the entropy. Just I keep the dishes mostly clean in my house and I have the oil changed in my car on time, I’ve got to stay fresh as a practicing scientist too. It isn’t easy.
This occurred to me, in part, when reading something that Joan Strassmann wrote (in the context of picking a good PhD advisor) that grad students are probably better at using R than their own advisor. I guess that’s the case in most labs, even if their advisors might have better statistical acumen.
If you’re a serious ecologist, nowadays, then R is an essential or near-essential tool. Here’s a confession: I’m useless with R. This is a problem. And it’s not a little problem, it’s a big problem.
I suspect that I’m not the only one in this boat, though I haven’t really heard anybody else admit to it. Every day that passes in which I still can’t use R, I’m not able to collaborate as effectively, the more reliant I am on others, and the less able I am to apply the most current tools to the experiments which I’m running. There is a single analysis that I should be able to do in R in an hour, that’s keeping me from submitting a manuscript that otherwise is pretty much done. That’s a problem.
Now, I’m not a statistical dunderhead. (I teach our graduate biostatistics class, but obviously teaching a class in something doesn’t mean you’re an expert). I design my experiments with specific tests in mind, and I choose ones that work, and I use model selection understanding the power and limitations of the approach. I understand frequentist vs. bayesian perspectives even though I don’t choose to say anything that would start a disagreement. (If you read my stuff, you can decide for yourself if I know what I’m talking about.) I guess you’ll probably just take me at my word that I’m not stupid when it comes to stats.
But there are a few analyses that I just can’t run easily, like NMDS or a GLMM. This is because I mostly use a powerful menu-driven version of SAS called JMP. It does nearly everything I want, and quite well. But there are a few analyses that I can’t run in JMP, which are becoming more and more relevant to the questions which I’m asking in my lab.
How did I get into this situation? Well, when do people learn R? In grad school. When I was in grad school, R was not the standard tool. Before then, I used SPSS on a mainframe (NO, not with punchcards) and a variety of easy-to-use programs on a Mac. (Statview was unparalleled for simple exploratory data analyses on Macs, and it was bought up by SAS and orphaned so that people would use JMP instead. The world has moved on without it.). By the time I was finishing up grad school in the late ’90s, R wasn’t in widespread use but it was ramping up. None of my fellow grad students were using it at the time, and I wasn’t behind the curve.
A few years later, while I was starting on the tenure track in the early 2000s, I put aside a little time to figure out R. That was a disaster and I couldn’t even get it to read my files. I had a few halfhearted attempts, but I couldn’t find the time. I looked into taking a short course, getting a book, but I didn’t have the time to make it happen. At this point, it wasn’t a critical failing, but I saw that more and more people were using R, and that I wasn’t one of them.
My lack of R mojo isn’t a teaching problem. Even if I was an R pro, I don’t think I’d use this in my course because the class is about understanding how statistics works and how to apply them, not how to use the software. I use JMP in the course because it is so easy to use, and I’m not going to waste instructional time on software tutorials. (We should have a separate class or seminar or experience that teaches students to use R, but it can’t fit in this class.) I’ve talked to people who teach with R in their courses, and they’ve reported that you either have to make it a course about learning stats, or learning R, but you can’t do both well with 45 hours of class time. Clearly, by using R you actually learn what you’re doing statistically, because that’s part of understanding coding. So I hear. But I’m not going to spend half of my time in class dealing with coding errors and stress when my students still don’t fundamentally understand probability, randomness and the actual nature of a null hypothesis.
While not a teaching problem, my lack of R mojo is a research problem. I am on it. I’ve been aware of this for a while, and I’ve found a way to deal with it.
For the last month, I’ve had sitting in my backpack wherever I go, what appears to be the exact resource I need: Beckerman and Petchey’s Getting Started with R: An Introduction for Biologists. From my quick browse, I feel mighty confident that using R like a pro is now only a matter of finding the time, and it doesn’t seem as insurmountable.
My hope is that, this summer, I find the time to actually remove the book from my bag and use it. This is the point in the narrative where I could explain everything I’ve done in the last month that would explain why I haven’t found the time to get to it, but you know the story. I won’t try to out-busy you.
This summer is already booked. Learning to use R to some degree of proficiency is going to take the amount of time that it would take to write a whole manuscript, or nearly write a whole grant. I have to decide which one of those things I’m not going to do to keep my skills sharp. Of course, I’ll be using R in the context of a manuscript. It’s just that this manuscript will take 2-3 times longer to write because of my R learning curve.
Maintenance isn’t optional. Learning R feels more like an engine replacement instead of an oil change, but I’ve got enough miles that I guess I’ve got to make the investment to avoid being sold for scrap.
Kodak stopped making the carousel projector less than ten years ago. I still have a carousel sitting around my lab, containing slides from the last talk I gave in this format. It wasn’t that long ago, really. (In the early 2000s, the Entomological Society of America hadn’t yet switched to accepting digital projection. That’s what still in the carousel.)
The world changes really quickly. As I’m doing my day-to-day faculty job, the world will be passing me by unless I actively work to keep pace. I always wondered how some people became deadwood. Now, I see how easy it is. It’s not about giving up, and it’s not about not caring. It’s about not strategically and systematically planning to keep up, which takes you away from immediate responsibilities. I’ve avoided this particular maintenance task for ten years, and just like when I go to get oil changed in the car, I’m not thrilled to spend my time that way. Of course, I’m glad that I can continue to drive a working car that will last a long while, and I’m glad that my soon-to-be-developed R mojo will keep me fresh for a good long while as well.