Can Large Language Models (LLMs) like ChatGPT be used to produce useful information?

leokitten · Aug 20, 2025

Given the latest news about GPT-5 I would say LLMs have hit a wall and are way more hype than substance.

Commentary: Say farewell to the AI bubble, and get ready for the crash

The AI frenzy has been fueled by relentless hype, but the dud launched by OpenAI has its former enthusiasts wondering if they've been taken.

www.latimes.com

ChronicallyOverIt · Aug 20, 2025

hotblack said:
These tools can make great leaps, and do things unexpected and new, the example of AlphaGo is a great one for this. As is I think things like protein folding or the maths challenges. It requires very specific domains with clear rules and concepts of what is ‘correct’ as well as the ability to test that. But understanding why it is possible there and not transferable to ‘solve this disease for me’ is important I think.

Alpha go is entirely different than LLM’s in architecture. Not really a great comparison if as were talking about LLM’s here.

For LLM’s to be true AGI there is a hope that with enough data “reasoning” will become an emergent property, possibly even consciousness, similar to the old Chinese room scenario.

ChronicallyOverIt · Aug 20, 2025

leokitten said:
Given the latest news about GPT-5 I would say LLMs have hit a wall and are way more hype than substance.

Commentary: Say farewell to the AI bubble, and get ready for the crash

The AI frenzy has been fueled by relentless hype, but the dud launched by OpenAI has its former enthusiasts wondering if they've been taken.

www.latimes.com

This is so funny to me, a few weeks ago before GPT-5 everyone was still a AI futurist. One bad model and stalled take off and now everyone’s a downer. I doubt this has changed the minds of true LLM AGI believers like Sam, theil, etc

leokitten · Aug 20, 2025

ChronicallyOverIt said:
This is so funny to me, a few weeks ago before GPT-5 everyone was still a AI futurist. One bad model and stalled take off and now everyone’s a downer. I doubt this has changed the minds of true LLM AGI believers like Sam, theil, etc

They are all full of shit and mostly just grifters. LLMs do not scale that much is known. So they are a dead end really the way they are currently designed.

They will need another huge discovery like transformers were for LLMs before another major advance can be made, and guess what no such discovery is on the immediate horizon they are just hoping to “fake it until they made it” and praying that throwing trillions of dollars at the problem there will be a new discovery

ChronicallyOverIt · Aug 20, 2025

leokitten said:
They are all full of shit and mostly just grifters. LLMs do not scale that much is known. So they are a dead end really the way they are currently designed.

They will need another huge discovery like transformers were for LLMs before another major advance can be made, and guess what no such discovery is in the immediate horizon they are just hoping to “fake it until they made it” and praying that throwing trillions of dollars at the problem there will be a new discovery

I think with enough data they were hoping it would be emergent.

Genie3 world consistency was an emergent property of more data in …. So there may be some value in the idea “it can just happen” but yes it is hope at this point.

hotblack · Aug 21, 2025

ChronicallyOverIt said:
Alpha go is entirely different than LLM’s in architecture. Not really a great comparison if as were talking about LLM’s here.

I thought I made the distinction in my posts and think it’s pretty clear? It also seems most likely that LLMs will stick around in some form but perhaps as the human interface to other forms of AI/ML, that’s what they’re most suited to after all.

The scaling stuff for LLMs has always been a pipedream pushed by a few. It’s an amazing grift and equally amazing they’ve managed to convince so many it may work. Like @leokitten says more discoveries and technologies are needed and the industry seemed to know that until ChatGPT blew up.

ChronicallyOverIt said:
Paper from Apple on why they aren’t pursuing LLM’s:

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes…

machinelearning.apple.com

I’m familiar with the paper. But saying Apple are not pursuing LLMs when they very clearly are (researching, developing and building products using them) seems to misrepresent things somewhat?

hotblack · Aug 21, 2025

I was reading this the other day and it is entertaining in some ways but highlights a number of the ways LLMs struggle.

Project Vend: Can Claude run a small shop? (And why does that matter?)

We let Claude run a small shop in the Anthropic office. Here's what happened.

www.anthropic.com

A positive read could be that they feel they can fix these issues. But I think it also shows how much the earlier points made in this thread stand and how hoping for an LLM to be able to do something new which they haven’t encountered or been trained for (like finding a new disease solution by prompting alone) is somewhat wishful thinking.

Adrian · Aug 21, 2025

This is a potentially interesting paper (I've only read the abstract). They seem to be fine tuning LLMs with time series data from various organs and then have an agentic approach to simulating the whole body based on these organ models. Could be an interesting approach to modelling the body as a complex system.

Organ-Agents: Virtual Human PhysiologySimulator via LLMs

https://arxiv.org/pdf/2508.14357

ecent advances in large language models (LLMs) have enabled new possibilities in simulating complex physiological systems through reasoning, generation, and agentic coordination. In this work, wepresent Organ-Agents, a novel multi-agent framework that simulates the dynamics of human physiology using LLM-driven agents. Each agent, referred to as a Simulator, is assigned to model a specificphysiological system such as the cardiovascular, renal, immune, or respiratory system. The trainingof the Simulators consists of two stages: supervised fine-tuning on system-specific time-series data,followed by reinforcement-guided inter-agent coordination that incorporates dynamic reference selection and error correction with assistantive agents. To support training, we curated a cohort of 7,134sepsis patients and 7,895 matched controls, constructing high-resolution, multi-domain trajectoriescovering 9 physiological systems and 125 clinical variables. Organ-Agents achieved high simulationaccuracy on 4,509 held-out patients, with average per-system mean squared error (MSE) below 0.16across all systems and robust performance across severity strata based on sequential organ failureassessment (SOFA) scores. Generalization capability was confirmed via external validation on 22,689intensive care unit (ICU) patients from two tertiary hospitals, showing moderate performance degradation under distribution shifts while maintaining overall simulation stability. In terms of clinicalplausibility, Organ-Agents reliably reproduces multi-system critical event chains (e.g., hypotension,hyperlactatemia, hypoxemia) with preserved event order, coherent phase progression, and minimaldeviations in both trigger timing and physiological values. Subjective evaluation by 15 critical carephysicians further confirmed the realism and physiological coherence of simulated trajectories, withmean Likert ratings of 3.9 and 3.7, respectively. The Simulator also supports counterfactual simulation under alternative fluid resuscitation strategies for sepsis, producing physiological trajectoriesand APACHE II scores that closely align with matched real-world patient groups. To further assessthe preservation of clinically meaningful patterns, we evaluated Organ-Agents in downstream earlywarning tasks using seven representative classifiers. Most models showed only marginal AUROCdegradation when transferring from real to generated and counterfactual trajectories, with performance drops generally within 0.04, indicating that the simulations preserved decision-relevantinformation for clinical risk simulation. Together, these results position Organ-Agents as a clinically credible, interpretable, and generalizable digital twin for in physiological modeling, enablingprecision diagnosis, treatment simulation, and hypothesis testing across critical care settings.

boolybooly · Aug 21, 2025

My background in this informs my perspective as I was introduced to programming very young and was writing my own BASIC by about the age of 14 (1978) on a new school Research Machines 380z. I later did some programming for businesses. When the ME hit I was not able or interested to pursue it. I have always viewed computers as marvellous toys.

There is an old adage in computing that you get out what you put in, plus the mechanism in between of course and that is always the invention of a human mind, which can be comprehended by the user and will always be the product of that foundation even if AI start programming themselves.

The way I look at LLM AIs personally takes into account that they are trained on the internet and then heavily moderated so its a bit like having a Bowdlerised conversation with the average output of the internet, which if it were unmoderated would be intolerable mayhem. So a bubbling cauldron with a lid on it at the best of times.

Considering how well that goes usually, (as I often visit gaming fora, some better managed than others), I don't have high expectations, so am constantly surprised and amused by the LLMs successes and for want of a better word... humanity. Programmed for sure but nevertheless an exemplar of an agreed standard of decency, imposed by others and which others might learn from, so an educational tool for our culture and since I trained to be a teacher I find that an intriguing possibility.

I do think today's LLMs are useful as a way of collating information on a subject which you can then apply a discriminating and sceptical eye to. I feel it is important to understand the nature of the product and where LLMs can go wrong but one of my hobbies has been beta testing games software for the last 25 years, so spotting glitches and seeing through a facade to the underlying computing is second nature for me so I do not feel out of my depth there, yet.

They have a potential to be personalised and store a repository of knowledge pertinent to me (one) to act as a tool to know and serve me as an individual but currently they are revised every few months and the slate is wiped clean unless you explicitly include conversations from the past in the starting data for a new conversation. I think they could be very useful as personal assistants but that is some time off and like AI driving vehicles they need to be developed and made safe first.

The underlying neuronal learning models have potential with the interpretation of scientific and medical data and that is a different application, with less hazard of input from sweary, bad mannered, rebellious adolescents. Every step developers take with these models will create step changes in the value and nature of the AI tools they produce. Inherently though these depend on their own insights into their own thinking processes and becomes a reflection of the human mind and the way evolution has developed cognitive processing to date.

In the future AI's might be able to improve on evolution though I think mostly in terms of speed and accuracy, possibly insight too, though that depends a little on how well we understand ourselves, which is not as well developed as I would like it to be. After millions of years of evolution and thousands of years of human thought IMHO the principles of morality and humanity have arrived at an evolutionarily stable strategy and so common sense and decency will continue to be what they are today and hopefully we will get better at delivering on these principles with the assistance of these tools.

In that endeavour people like Musk and Altman are slightly dangerous double edged swords in that they can push the agenda forwards but they can also pervert it to serve their special interests. However I have little doubt the rest of the human race will provide them with "feedback" and hold them to account.

Adrian · Aug 21, 2025

hotblack said:
The scaling stuff for LLMs has always been a pipedream pushed by a few. It’s an amazing grift and equally amazing they’ve managed to convince so many it may work. Like @leokitten says more discoveries and technologies are needed and the industry seemed to know that until ChatGPT blew up.

There is lots of work on LLM architectures although it tends to be small adaptations on the basic Decoder based arch. For example, Microsoft have released a model using Mamba that uses recurrent layers to change linearize the computation in the prefill stage. Google have done some interesting stuff with adaptive hidden unit layers allowing the same model to be used with different computational requirements. There is also lots of interesting stuff around KV Caches and how to reduce overheads which again very much reduces the compute requirements but also provides ways to precompute on things coming from vector dbs and avoiding long contexts and hence compute times.

hotblack · Aug 21, 2025

Adrian said:
There is lots of work on LLM architectures although it tends to be small adaptations on the basic Decoder based arch.

Absolutely. The efficiency improvements have been huge. That small local models can do what huge models used to be needed for and that the big players are pushing on efficiency even on their larger models to reduce costs seems far more significant that any imagined progress towards AGI. But the latter gets headlines and investors I guess.

Have you got any links to the Mamba work? I hadn’t heard about that. I only recently learnt about quantization aware training and had heard about Google’s and some of the KV cache work but a lot of the detail is beyond my understanding tbh.

Adrian · Aug 21, 2025

hotblack said:
Have you got any links to the Mamba work?

Its on my reading list - they had interesting results I think its significant the Microsoft have picked it up in one of their small models (one Phi-4-mini) although I've not tried that model yet.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured...

arxiv.org

jnmaciuch · Aug 21, 2025

Adrian said:
This is a potentially interesting paper (I've only read the abstract). They seem to be fine tuning LLMs with time series data from various organs and then have an agentic approach to simulating the whole body based on these organ models. Could be an interesting approach to modelling the body as a complex system.

Organ-Agents: Virtual Human PhysiologySimulator via LLMs

https://arxiv.org/pdf/2508.14357

This is what I am starting to do now as part of my PhD. Long story short is that so far I’m unimpressed and haven’t seen many examples of insights from an AI model being validated in vivo unless there was already strong reason to think a certain gene/pathway was relevant independently of AI

darrellpf · Sep 19, 2025

AI's can very very good at reading large amounts of data. That said, the old rule of garbage in garbage out still applies.

A lot of medical research tends to report their results in favor who paid for the research. For instance, a lot of cannabis research comes from the "find why cannabis is bad for you" of the 70's. Newer research tells a much different story.

Medical language also has a difficult relationship with causality.

Most areas of life have "common knowledge" or similar education/indoctrination, so a way of thinking is also established pretty much wherever they go.

It can be difficult to get an AI to go "against the grain" and synthesize a new result.

To get around some of those issues, I've used "research in the last 5 years" or "what is the relationship between x and y". For the relationship questions, using the "deep thinking" mode tends to give better answers and often means the AI will generate a detailed "show your work" report that can be read through carefully.

darrellpf · Sep 19, 2025

Big oops or Eureka?

I'm a Comp Sci guy. Most of my career has revolved around learning new technologies, particularly in their infancy. I tend to be very aware of their uses and limitations as they evolve. In the last six months I've been using an AI to surface interesting medical research regarding ME/CFS.

Over the many years I've had ME/CFS I've had one oddity than never fit the criteria. For most people, including me, PEM has always been a main debilitating factor. The advice is to do pacing and generally be very careful about physical activity. My version of pacing is one mixed with lots of exercise. Exercise, lie flat, exercise again. For instance I go for a brisk 45 minute walk each morning, lie flat for an hour or two, then go to the store 5 blocks away, walk the two grocery bags home and lie down for another hour or more until the energy returned.

Because I have the more sophisticated AI research plan, I ran my Ancestry DNA file through it and asked for the 5 most problematic SNPs. Note that the resulting genetic data is typically used to determine location rather than being at all useful for medical genetic assays.

In my genetic data I have

rs176027951741645244TC

Which is benign rather than problematic. The first problematic entry the AI produced in the report it made was

RS1760279 AMPD1

Of course I started some research on AMPD1 and found these symptoms. The AI insisted I had the gene from both parents, so worst case.

Exercise Intolerance: The most common symptom is a reduced ability to sustain physical activity.
Fatigue: Experiencing tiredness more quickly and for longer periods after exertion.
Muscle Pain (Myalgia): Aching or tenderness in the muscles, particularly after exercise.
Muscle Cramps: Involuntary muscle contractions, often during or after physical activity.

Briefly, AMPD1 deficiency (from both parents) results in an inability to process CYP3A4, the most common enzyme in most humans. The deficiency often goes unnoticed because there are compensatory mechanisms that tend to hide it. If the compensatory mechanisms are overwhelmed then the symptoms tend to appear. Interestingly, athletes with this deficiency do poorly at most sports that involve sudden spurts of glucose but do much better at sports such as long distance running. The deficiency is also tied to metabolic syndrome and diabetes.

I think of the current AI's as untrustworthy research assistants or law clerks so I quadruple check answers in various ways. I carefully read the research papers it is using and curate the important ones. I run the same facts through different AI's. After all my AMPD1 research I did exactly that, only to discover that the AI had given me answers of what appeared to be a typo (my benign gene had an extra number at the end)

Now I start to think the AI is telling me what I want to hear, but this deserves a closer look. I'd like to understand why the error was made. In general, asking an extremely detailed question gives a reliable answer but if you ask a general question (what is the relationship between ME/CFS and CYP3A4?) there is much more latitude for the AI to come up with more interesting answers, not necessarily what your were looking for.

In the last few months, one of the enhancements for AI is their ability to remember previous conversations. Many of my previous conversations revolve around CYP3A4 because the drug I've been taking for 10+ years completely inhibits CYP3A4.

In essence, it appears that the AI response used not only my DNA data, but also my previous queries to synthesize an answer that has a lot of merit. In this case it looks like I used my CYP3A4 status to imply that I had an acquired AMPD1 issue.

More interestingly, the CYP3A4 gene deficiency has a strong geographic component that reflects my Eastern European heritage. I'm wondering if the AI also took that into consideration when it "misspoke".

Regardless, to me AMPD1 leading to a CYP3A4 deficiency looks like an interesting data point that deserves some research.

dilieil · Nov 1, 2025

Fields medalist Timothy Gowers has reported that GPT5 is able to help him with math:

I crossed an interesting threshold yesterday, which I think many other mathematicians have been crossing recently as well. In the middle of trying to prove a result, I identified a statement that looked true and that would, if true, be useful to me. 1/3
Instead of trying to prove it, I asked GPT5 about it, and in about 20 seconds received a proof. The proof relied on a lemma that I had not heard of (the statement was a bit outside my main areas), so although I am confident I'd have got there in the end, 2/3
the time it would have taken me would probably have been of order of magnitude an hour (an estimate that comes with quite wide error bars). So it looks as though we have entered the brief but enjoyable era where our research is greatly sped up by AI but AI still needs us. 3/3

Other big names such as Terence Tao and Scott Aaronson are reporting similar things. A common theme seems to be that it's very important to continue interrogating the model even if its first answer is wrong. LLMs are capable of reasoning and can productively adapt in response to criticism. For example, here's something mathematician Ernest Ryu said about using GPT5 to develop a proof:

ChatGPT did not produce the proof in a single prompt. The process was highly interactive. It generated many arguments, roughly 80% of which were incorrect. Yet some were genuinely novel to me. Whenever I recognized a novel idea, whether correct or only partially so, I distilled the key insight and prompted ChatGPT to develop it further.

We've all seen LLMs output slop related to ME/CFS, but I'm very curious what the result would be if someone knowledgeable such as Jonathan Edwards were to have a longer back-and-forth with GPT5 about some specific sub-problem he's been thinking about, explain to it what's wrong with its slop answers, and try to prompt it into giving something more useful. Perhaps the models aren't good enough yet -- they've only just recently gotten there in math, and bio may take some extra time -- but perhaps they're already of some use if you prompt them the right way.

hotblack · Nov 1, 2025

To me one important underlying factor on usefulness is anchored in if the output is testable/verifiable. This is why LLMs can end up being helpful with grounded information retrieval or text summarisation, mathematics or with writing code, there can be an iterative loop there. This allows models to be developed and allows us to work with them. And indeed why things within biology like advances in predicting protein folding were possible in wider AI.

There’s some interesting ideas about building the layers up over time to be able to introduce an understanding of gene interactions and expression, protein interactions and various pathways to eventually be able to model a simple cell. But that is a way off. So expecting insight into disease pathways from an LLM now? I think that seems quite unlikely

hotblack · Nov 1, 2025

I do though like the idea @dilieil has of someone like JE having a long conversation with an LLM (although I’m not sure he would…) I don’t think it would necessarily help the LLM come up with better ideas (due to the above) but I’d be more optimistic about it helping him or others bounce ideas around and come up with theories

boolybooly · Dec 1, 2025

LLMs do not learn from conversations.

There would need to be a new class of LLM developed to do this, which I have no doubt will happen eventually, one which records an information model regarding a user identity.

Other applications like research assistance could also happen but the generalised LLM is kind of Pooh Bear, wondering along answering the next question then forgetting all about it.

However, that said, I was recently very impressed by the way Google AI grasped the nature of ACAI (atypical chronic active infection) and described the mechanisms involved from a single sentence question. It outperformed every medic I have seen for 4 decades. So I think we are making progress with LLMs and they are now capable of focussing research available online more effectively with each iteration.

why did I get recurring virus after catching EBV followed by HSV2

I think LLMs can synthesise information in ways which are akin to reviews and almost as useful as new information because potentially noone has ever put these facts together before but for now they cannot actually produce information as they cannot run experiments. I expect that will change too.

mariovitali · Dec 1, 2025

I have no doubt that the reasoning process of LLMs can help put pieces of the puzzle of ME/CFS together. Unfortunately people having seen hallucinations believe that *every* output from LLMs is total crap. I recall reading someone saying "as soon as I read that something is from an LLM, I stop reading it". This is negative bias.

By using a Mixture-Of-Agents (MOAs) approach, we can have a committee of experts checking on each other for validity of the output, including the reasoning.

Can Large Language Models (LLMs) like ChatGPT be used to produce useful information?

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Administrator

Senior Member (Voting Rights)

Administrator

Senior Member (Voting Rights)

Administrator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Established Member

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)