I got fooled by AI-for-science hype—here's what it taught me. Nick McGreivy 2025

Murph

Senior Member (Voting Rights)
https://www.understandingai.org/p/i-got-fooled-by-ai-for-science-hypeheres

I got fooled by AI-for-science hype—here's what it taught me
I used AI in my plasma physics research and it didn’t go the way I expected.
Nick McGreivy
May 19, 2025

In 2018, as a second-year PhD student at Princeton studying plasma physics, I decided to switch my research focus to machine learning. I didn’t yet have a specific research project in mind, but I thought I could make a bigger impact by using AI to accelerate physics research. (I was also, quite frankly, motivated by the high salaries in AI.)

I eventually chose to study what AI pioneer Yann LeCun later described as a “pretty hot topic, indeed”: using AI to solve partial differential equations (PDEs). But as I tried to build on what I thought were impressive results, I found that AI methods performed much worse than advertised.

At first, I tried applying a widely-cited AI method called PINN to some fairly simple PDEs, but found it to be unexpectedly brittle. Later, though dozens of papers had claimed that AI methods could solve PDEs faster than standard numerical methods—in some cases as much as a million times faster—I discovered that a large majority of these comparisons were unfair. When I compared these AI methods on equal footing to state-of-the-art numerical methods, whatever narrowly defined advantage AI had usually disappeared.

This experience has led me to question the idea that AI is poised to “accelerate” or even “revolutionize” science. Are we really about to enter what DeepMind calls “a new golden age of AI-enabled scientific discovery,” or has the overall potential of AI in science been exaggerated—much like it was in my subfield?

Many others have identified similar issues. For example, in 2023 DeepMind claimed to have discovered 2.2 million crystal structures, representing “an order-of-magnitude expansion in stable materials known to humanity.” But when materials scientists analyzed these compounds, they found it was “mostly junk” and “respectfully” suggested that the paper “does not report any new materials.”

story continues at link: https://www.understandingai.org/p/i-got-fooled-by-ai-for-science-hypeheres
 
I share this piece because I suspect some researchers are getting excited by AI. I don't think it's anywhere near being generally useful yet. Of course there might be tasks where it can be deployed really usefully. In my own line of work it is incredibly useful at transcribing audio, for example.

But one analogy i've heard is AI is like the microwave - it plays a role in the kitchen and is great at some things. But if you try to use it to cook the whole dinner you're going to have a bad time.
 
This has been my experience trying (and failing) to use it for research. It’s good for tasks where the sheer amount of data to sort through is unreasonable for one person and you aren’t relying on the accuracy of the results. Which only happens to cover vanishingly few tasks in my work. It doesn’t even save me time writing code unless I’m using a language where I’m an absolute beginner.
 
But one analogy i've heard is AI is like the microwave - it plays a role in the kitchen and is great at some things. But if you try to use it to cook the whole dinner you're going to have a bad time.
I think that’s a really good analogy. It’s a tool. Or rather a range of tools, that when used for the right things by someone who knows what they’re doing can be great. But it is not a magic wand.
I know some people still working in tech and doing lots of interesting things with AI/ML. But are also having to deal with people’s unrealistic expectations. Both positive and negative.
 
This has been my experience trying (and failing) to use it for research. It’s good for tasks where the sheer amount of data to sort through is unreasonable for one person and you aren’t relying on the accuracy of the results. Which only happens to cover vanishingly few tasks in my work. It doesn’t even save me time writing code unless I’m using a language where I’m an absolute beginner.

AI can help but you need to be careful in how its used and the way the overall systems are designed. AI has very much changed from when this researcher started and it is improving all the time.

I do find it useful for coding - its largely replaced me looking stuff up on stack exchange - but that is often just a starting point to find out how integrate stuff into the OS.
 
This has been my experience trying (and failing) to use it for research. It’s good for tasks where the sheer amount of data to sort through is unreasonable for one person and you aren’t relying on the accuracy of the results. Which only happens to cover vanishingly few tasks in my work. It doesn’t even save me time writing code unless I’m using a language where I’m an absolute beginner.
So far the only real benefit I have seen from AI is the potential for greater efficiency in spotting patterns in data. Which is a very real benefit, to be sure.

But it is not good at the critical function of interpreting and recognising meaningful patterns in data. That is likely to remain a role only humans can do, at least for a while yet.
 
Partial differential equations (PDEs) tend to generally model extremely specific phenomena (how heat diffuses through a room or how waves spread through space are very different things) and as a result, contrary to ordinary differential equations, essentially every PDE requires its own theory that accounts for these intricacies. Some people spend years dedicated to studying the exact specifics of a single PDE and I guess it's sometimes quite similar for numerical methods, so I would not be surprised if more general AI methods, even if customized, fail as soon as things get interesting, precisely because you have to be able to do all the nitty gritty things and get your hands dirty. There do appear to be some very interesting use cases where AI methods apparently outperform classical methods in PDEs, for instance in overcoming the "curse of dimensionality" in extremely high dimensional PDEs, but I think that in itself maybe still be a rather specific thing, rather than something extremely general.

So much as you @Murph, I see it as a tool that is sometimes useful and sometimes not (of course it's a rather revolutionary tool that will have fundamental impacts throughout the world, negative and positive), just like all other tools and perhaps most importantly I would like for that tool to be useful and to do the things I don't enjoy doing so that I have more time for the things that I enjoy, rather than the opposite (I don't want an AI to make art or music for me, but please let it vacuum my floor!). Given that most research we see here on S4ME tends to tends to be of low standard it's hardly suprising that the impact of this tool can often be net negative if it just allows people too churn out even more junk. And when it comes to things like LLM's last I heard, the above mentioned Yann LeCun, is extremely critical of those (I believe he called them trash).
 
@Murph Thank you for this article. There is a study which just came to my attention.

Title : Generalization bias in large language model summarization of scientific research which can be found here :

https://royalsocietypublishing.org/doi/epdf/10.1098/rsos.241776

The paper suggests that LLMs overgeneralise information from published research. More specifically :

Our results indicate a strong bias in many widely used LLMs towards overgeneralizing scientific conclusions, posing a significant risk of large-scale misinterpretations of research findings. We highlight potential mitigation strategies, including lowering LLM temperature settings and benchmarking LLMs for generalization accuracy

Despite the above, I believe that their reasoning abilities are very impressing to say the least. Time will tell.
 
I don't want an AI to make art or music for me,
I would. I'd love to have more books--or computer games--of the style I enjoy. I wouldn't object to an AI monitoring me and playing background music that suits my mood. I'd rather do some mindless exercise (vacuuming, lawn mowing) and have a good book to read than have a computer mow my lawn and me being bored because I don't have a good book to read.
 
I would. I'd love to have more books--or computer games--of the style I enjoy. I wouldn't object to an AI monitoring me and playing background music that suits my mood. I'd rather do some mindless exercise (vacuuming, lawn mowing) and have a good book to read than have a computer mow my lawn and me being bored because I don't have a good book to read.
Personally, I don’t like the sound of that - it feels like an abdication from choice.
 
I would. I'd love to have more books--or computer games--of the style I enjoy. I wouldn't object to an AI monitoring me and playing background music that suits my mood. I'd rather do some mindless exercise (vacuuming, lawn mowing) and have a good book to read than have a computer mow my lawn and me being bored because I don't have a good book to read.

The point on books and computer games I can still understand (even though I think things tend to not go into that direction, similar to how targeted advertisements don't better satisfy the needs of individuals).

But I can't see your other point. There's never going to be the possibility of stopping anybody from not doing mindless exercises if they don't want to. Everybody can wash their dishes by hand, handwash their clothes, manually clean their floors with a toothbrush or use a scissors to cut their lawn if they want to. The whole point is for there to be a larger choice of which mindless exercise people want to engage in, not the opposite.
 
Title : Generalization bias in large language model summarization of scientific research which can be found here :

https://royalsocietypublishing.org/doi/epdf/10.1098/rsos.241776

The paper suggests that LLMs overgeneralise information from published research. More specifically :
I've been seeing this a lot, and frankly looking at how research is covered in the media, it's probably simply picking on the same human biases. All media suffers from the same clickbait model, even in research. The number of times I have seen headlines suggesting groundbreaking results from a single pilot study of dubious methodology that wasn't replicated and whose results should be interpreted with a mountain of salt...

Technology always follow the same track:

0*pFVq7xQLCMO2l3M3


The bigger problem with AI is that it was made into a consumer product far too early, and is getting a bad reputation for it. Imagine if everyone tried jumping on the Internet bandwagon in the late 90s and had to deal with, well, the state of the Web in the late 90s and the consumer-grade technology for it. Fortunately it couldn't happen, because it required to build trillions in equipment and generations of R&D growth.

Normally, technology like AI is at its current stage is left to labs, then professionals, so it doesn't develop this early "wait, this isn't ready" moment because there are too many barriers to deploy. Because the Internet is so mature, AI was made instantly available to the whole human population.

What's funny about this is that despite all the hype, the moment when AI reaches the stage of runaway growth will still catch most people by surprise, because they're expecting it to grow slowly and linearly. It will go from "haha this is useless" to "holy crap what happened?!" so fast, but the problem with exponentials is that they all seem to grow slowly until they don't, and most people (justifiably in most cases) extrapolate based on continuing a linear trend.

Information isn't limited the same way physical devices are.
 
AI can help but you need to be careful in how its used and the way the overall systems are designed. AI has very much changed from when this researcher started and it is improving all the time.

I do find it useful for coding - its largely replaced me looking stuff up on stack exchange - but that is often just a starting point to find out how integrate stuff into the OS.

This is a really good point - AI keeps changing. I see people saying things about it that were true a few months ago, e.g. "it can't even draw fingers!" that are simply not true any more.

If you take a principled stand against AI and stop using it (which is probably a good idea!), you don't also get to have opinions on what it can and can't do. The language models now give their sources, the waymo model now drives safer than a person, the fingers are rendered quite nicely, things aren't where they were a short time ago.
 
Personally, I don’t like the sound of that - it feels like an abdication from choice.
I think some people are taking my post the wrong way. If it's about having a computer adjust music for me, I mean that while I'm reading, or composing something on this forum, I like having non-invasive background music. At other times, I might prefer something a bit more stimulating or cheerful. I'm not asking for a computer to buy music for me. The few radio stations I can receive generally have really annoying selections, so those aren't good background music.

There's never going to be the possibility of stopping anybody from not doing mindless exercises if they don't want to.
My point is that I'd rather have an AI provide me with more entertainment options rather than take over some daily chores. EndME had the opposite desire, so my comment was directed at that. My ME doesn't prevent me from doing mindless physical activities such as housework or yardwork, but finding entertainment is difficult.
 
Back
Top