7 min read

Statistics in the era of AI

How do we mentor, teach, and do stats when AI can do so much of the work?
My understanding is that screen readers will be able to read the text in here. That's good because ghost only allows 181 characters in alt text.
ChatGPT is more than willing to write my abstract or my cover letter to the journal. Uhhh, no thanks?

Y'all. Let me tell you. I've had a wild week.

Over the last couple years, I've used ChatGPT to troubleshoot my code and come up with some snippets for data.

But for the last few days. Phew. Our university system has signed up for the fancy version of ChatGPT at an extremely low cost, available to all faculty and students.

So I fed it a dataset that's been giving me headaches. I just gave it my .csv files. I told it how I wanted to clean up the dataset and prepare it for analyses. Then I verified that the cleaning worked and that there weren't any (egregious) errors. Then I said I wanted to run a set of analyses, make particular comparisons. It asked a few follow-up questions about how I wanted to structure the analyses.

It ran them. Then, they asked if I wanted to do a few more additional ones that made sense (that I was planning to do next, actually). Then it suggested doing a couple tweaks to the first analysis to account for the variance structure of the data. Then it did that stuff. All my hypotheses were pretty much answered. In, like, 15 minutes.

Then, I decided to tell ChatGPT what my variables represented and how the field experiment was designed. It offered some interpretations about what was happening. And, let me tell you, these were the mechanisms that I had in mind behind the hypotheses that I was testing with those models. Then it suggested running a couple additional analyses to decide if I should prefer one parameter over another based on the nature of my question. I was wondering about this, but, frankly, I wasn't quite sure what approach to take for that issue.

I looked into it, and lo and behold, the approach that was recommended to me made a LOT of sense. An then it offered to take some bootstrapping and Bayesian approaches I really would have wanted to have done but haven't had the coding prowess to do, even if I have the mindset to prefer these approaches.

Then, I offered to package this all up for me in both a word doc and a pdf. And then offered to provide some more interpretive text. LIKE THIS:

My understanding is that screen readers will be able to read the text in here. That's good because ghost only allows 181 characters in alt text.

Are you fucking kidding me?

And then it asked me if I wanted it packaged in a Jupyter notebook. It ran all the tests for me – and developed publication quality graphics with very little prodding in the right direction – and then offered a scientific interpretation. Which made actual sense and was informed by the biology of the organisms and the context that they were in.

I looked at that paragraph. The writing was stilted and had none of my flair, but it was technically not incorrect and actually captured the reason that I designed the study and came up with the answer that I had arrived to independently.

These are not statistical approaches that you see everyday, and these are r packages that not many folks use.

I have to stay, my mind was blown. Still is blown.

You see, nine years ago, when I took a year-long sabbatical, one of my three goals was to become proficient at R. It was a slog. I wouldn't say I became proficient but I became slightly capable. I could understand code and could meet some of my very basic needs. I learned how searching and stack overflow were necessary. I learned that good coders borrow from one another all the time. Why write a passage of code when something that does exactly what you want already exists? If you understand what it's doing along every step of the way, what's the problem? Then, you had to know enough about coding and enough about statistics to put together useful code that gave you useful answers.

Anyhow, even at the end of the sabbatical, there were a few projects that just broke my brain. I couldn't wrangle them in R. It wasn't so much the statistical tests but the data management, getting the data frames and the variables and the cleaning and the this and the that.

But in a few sentences and a mere 15 seconds (plus what might amount to a gallon of clean water and enough electricity to power every home in Topeka for 20 minutes) ChatGPT did what I couldn't do after a year of sabbatical. And then some. And then some more. I went through every step of the process and every data frame. It all was kosher. I'm no dolt when it comes to statistics - at one point I was the most qualified person to teach graduate-level stats in my department and I did for several years! It all was legit.

Literally all the stats I wanted to do on this project were done for by. In this case, ChatGPT provided a service to me that I would have paid a data wizard a couple thousand bucks to do. And honestly I have been tempted to do so out of my pocket just to get this out of my laptop and into a journal article.

When this paper gets published – and I mean when it gets published – every single word of the manuscript will have flown from my fingers and perhaps with bits chipped in from coauthors (some of whom were undergrads in my lab but now are on their second permanent job after their PhD. If they even remember the project! Thank goodness I kept such good notes when we were doing the fieldwork.) All the text will be mine but NONE of the code will be. Not a fricking line of it. But I'll publicly archive it, the reviewers can choose to scrutinize it as much as they choose (and I hope they do!), and you'll be able to download it and do the same once the preprint goes out.

In my opinion this is entirely legit. Coding is the means to the end of running statistical tests and creating visualizations. If I hired a technician to do the fieldwork instead of doing it myself, would you have an issue with that? If I hired a consultant to write the code when I told them what I needed, would that be a problem? Then, what's the problem in doing stats with an (AI) consultant?

The thing is, I think this isn't a simple question. There's a lot of nuance. There very well could be a lot of problems with doing stats with an AI consultant. Or there might not me. I'd like to think that based on my expertise in all of the domains in this project, there aren't problems. I would never submit something that I couldn't explain on the fly or don't understand. But I'm not sure if the way we do peer review is prepared to handle all of these kinds of circumstances.

It's hard to imagine that any of the ideas that I'm writing will be coming from ChatGPT, and none of the stats that I present will be ones that ChatGPT decided to run for me. I'm particular about how I explore data, how I test hypotheses, and how I choose to create data visualizations.

BUT

if I wanted ChatGPT to have a strong hand in deciding what to do with my data, how to present it, what it means, and even writing the paper, I could.

Which means at this moment, there are plenty of people who are doing exactly this. These AI-generated publications that are just filling up nonselective and/or predatory journals? I now have experienced how would do such a thing. I could have written a whole paper in a few hours that way! Also, I could never ever EVER have brought myself to have done so. But then again, I'm tenured and I'm not under any kind of gun for productivity.

And here's the thing. If you have an interesting question, and run a creative and sound experiment to generate original data to test that question, and then go through AI to analyze and write it up? And you go through careful quality control and assurance? I think the most important thing there is the quality of the research question and the experimental design. AI bots will always "yes, and..." you to the point of obsequiousness and will readily get things wrong. But if the data are real, and the tests are real, and the code works, then the answer to the question is there.

These are interesting times.

I was talking to a friend the other day and he told me that in one week, he was able to use AI-supported coding to do the job that a couple years ago would have taken a postdoc several months to get done. Wow.

Which leads me to the question that I really wanted to write about. These are questions, and not answers. So I'll just ask these questions, and I'm sure we'll be talking more about them later.

When we are training our mentees – PhD students, undergrads, whomever – how much are we training them to learn how to code, and how much are we training them how to generate code by any reliable means necessary to get the job done? Not long ago, every ecology grad student needed to be proficient in R if they were to get anything done. Is that still true? Are programs changing how they are doing things? I have no idea.

We need to make sure that everybody understands probability, hypothesis testing, experimental design, and all that stuff. And understands how statistical tests work. Honestly I think we're not even doing that now, as getting the coding down is hard enough it's hard to actually get at the conceptual foundations underlying the statistics. So if generating code is so easy now, does that make teaching statistics easier, or does it make it harder?

And what are we doing in classrooms? What are we teaching undergrads about R (for the programs that have had their act together enough to have rolled this out)? Are we going to be teaching the bare bones of coding for the next decade when practitioners realize that we shouldn't even be bothering with that stuff anymore?

And what are we doing in our own research programs. What we need more than anything else is TIME and PEOPLE to get stuff done. I'm not the only one to have discovered that this stuff creates ginormous efficiencies. This ChatGPT thing will give me the time and the lack-of-need-for-coding-wizards to push manuscripts out the door that otherwise would not be seeing daylight until my retirement. I'm goddamn thrilled about this. How are you you all handling this?