Can Large Language Models (LLMs) like ChatGPT be used to produce useful information?

These tools can make great leaps, and do things unexpected and new, the example of AlphaGo is a great one for this. As is I think things like protein folding or the maths challenges. It requires very specific domains with clear rules and concepts of what is ‘correct’ as well as the ability to test that. But understanding why it is possible there and not transferable to ‘solve this disease for me’ is important I think.
Alpha go is entirely different than LLM’s in architecture. Not really a great comparison if as were talking about LLM’s here.

For LLM’s to be true AGI there is a hope that with enough data “reasoning” will become an emergent property, possibly even consciousness, similar to the old Chinese room scenario.
 
Given the latest news about GPT-5 I would say LLMs have hit a wall and are way more hype than substance.

This is so funny to me, a few weeks ago before GPT-5 everyone was still a AI futurist. One bad model and stalled take off and now everyone’s a downer. I doubt this has changed the minds of true LLM AGI believers like Sam, theil, etc
 
This is so funny to me, a few weeks ago before GPT-5 everyone was still a AI futurist. One bad model and stalled take off and now everyone’s a downer. I doubt this has changed the minds of true LLM AGI believers like Sam, theil, etc
They are all full of shit and mostly just grifters. LLMs do not scale that much is known. So they are a dead end really the way they are currently designed.

They will need another huge discovery like transformers were for LLMs before another major advance can be made, and guess what no such discovery is on the immediate horizon they are just hoping to “fake it until they made it” and praying that throwing trillions of dollars at the problem there will be a new discovery
 
Last edited:
They are all full of shit and mostly just grifters. LLMs do not scale that much is known. So they are a dead end really the way they are currently designed.

They will need another huge discovery like transformers were for LLMs before another major advance can be made, and guess what no such discovery is in the immediate horizon they are just hoping to “fake it until they made it” and praying that throwing trillions of dollars at the problem there will be a new discovery
I think with enough data they were hoping it would be emergent.

Genie3 world consistency was an emergent property of more data in …. So there may be some value in the idea “it can just happen” but yes it is hope at this point.
 
Alpha go is entirely different than LLM’s in architecture. Not really a great comparison if as were talking about LLM’s here.
I thought I made the distinction in my posts and think it’s pretty clear? It also seems most likely that LLMs will stick around in some form but perhaps as the human interface to other forms of AI/ML, that’s what they’re most suited to after all.

The scaling stuff for LLMs has always been a pipedream pushed by a few. It’s an amazing grift and equally amazing they’ve managed to convince so many it may work. Like @leokitten says more discoveries and technologies are needed and the industry seemed to know that until ChatGPT blew up.

Paper from Apple on why they aren’t pursuing LLM’s:
I’m familiar with the paper. But saying Apple are not pursuing LLMs when they very clearly are (researching, developing and building products using them) seems to misrepresent things somewhat?
 
I was reading this the other day and it is entertaining in some ways but highlights a number of the ways LLMs struggle.
A positive read could be that they feel they can fix these issues. But I think it also shows how much the earlier points made in this thread stand and how hoping for an LLM to be able to do something new which they haven’t encountered or been trained for (like finding a new disease solution by prompting alone) is somewhat wishful thinking.
 
This is a potentially interesting paper (I've only read the abstract). They seem to be fine tuning LLMs with time series data from various organs and then have an agentic approach to simulating the whole body based on these organ models. Could be an interesting approach to modelling the body as a complex system.

Organ-Agents: Virtual Human PhysiologySimulator via LLMs


ecent advances in large language models (LLMs) have enabled new possibilities in simulating complex physiological systems through reasoning, generation, and agentic coordination. In this work, wepresent Organ-Agents, a novel multi-agent framework that simulates the dynamics of human physiology using LLM-driven agents. Each agent, referred to as a Simulator, is assigned to model a specificphysiological system such as the cardiovascular, renal, immune, or respiratory system. The trainingof the Simulators consists of two stages: supervised fine-tuning on system-specific time-series data,followed by reinforcement-guided inter-agent coordination that incorporates dynamic reference selection and error correction with assistantive agents. To support training, we curated a cohort of 7,134sepsis patients and 7,895 matched controls, constructing high-resolution, multi-domain trajectoriescovering 9 physiological systems and 125 clinical variables. Organ-Agents achieved high simulationaccuracy on 4,509 held-out patients, with average per-system mean squared error (MSE) below 0.16across all systems and robust performance across severity strata based on sequential organ failureassessment (SOFA) scores. Generalization capability was confirmed via external validation on 22,689intensive care unit (ICU) patients from two tertiary hospitals, showing moderate performance degradation under distribution shifts while maintaining overall simulation stability. In terms of clinicalplausibility, Organ-Agents reliably reproduces multi-system critical event chains (e.g., hypotension,hyperlactatemia, hypoxemia) with preserved event order, coherent phase progression, and minimaldeviations in both trigger timing and physiological values. Subjective evaluation by 15 critical carephysicians further confirmed the realism and physiological coherence of simulated trajectories, withmean Likert ratings of 3.9 and 3.7, respectively. The Simulator also supports counterfactual simulation under alternative fluid resuscitation strategies for sepsis, producing physiological trajectoriesand APACHE II scores that closely align with matched real-world patient groups. To further assessthe preservation of clinically meaningful patterns, we evaluated Organ-Agents in downstream earlywarning tasks using seven representative classifiers. Most models showed only marginal AUROCdegradation when transferring from real to generated and counterfactual trajectories, with performance drops generally within 0.04, indicating that the simulations preserved decision-relevantinformation for clinical risk simulation. Together, these results position Organ-Agents as a clinically credible, interpretable, and generalizable digital twin for in physiological modeling, enablingprecision diagnosis, treatment simulation, and hypothesis testing across critical care settings.
 
My background in this informs my perspective as I was introduced to programming very young and was writing my own BASIC by about the age of 14 (1978) on a new school Research Machines 380z. I later did some programming for businesses. When the ME hit I was not able or interested to pursue it. I have always viewed computers as marvellous toys.

There is an old adage in computing that you get out what you put in, plus the mechanism in between of course and that is always the invention of a human mind, which can be comprehended by the user and will always be the product of that foundation even if AI start programming themselves.

The way I look at LLM AIs personally takes into account that they are trained on the internet and then heavily moderated so its a bit like having a Bowdlerised conversation with the average output of the internet, which if it were unmoderated would be intolerable mayhem. So a bubbling cauldron with a lid on it at the best of times.

Considering how well that goes usually, (as I often visit gaming fora, some better managed than others), I don't have high expectations, so am constantly surprised and amused by the LLMs successes and for want of a better word... humanity. Programmed for sure but nevertheless an exemplar of an agreed standard of decency, imposed by others and which others might learn from, so an educational tool for our culture and since I trained to be a teacher I find that an intriguing possibility.

I do think today's LLMs are useful as a way of collating information on a subject which you can then apply a discriminating and sceptical eye to. I feel it is important to understand the nature of the product and where LLMs can go wrong but one of my hobbies has been beta testing games software for the last 25 years, so spotting glitches and seeing through a facade to the underlying computing is second nature for me so I do not feel out of my depth there, yet.

They have a potential to be personalised and store a repository of knowledge pertinent to me (one) to act as a tool to know and serve me as an individual but currently they are revised every few months and the slate is wiped clean unless you explicitly include conversations from the past in the starting data for a new conversation. I think they could be very useful as personal assistants but that is some time off and like AI driving vehicles they need to be developed and made safe first.

The underlying neuronal learning models have potential with the interpretation of scientific and medical data and that is a different application, with less hazard of input from sweary, bad mannered, rebellious adolescents. Every step developers take with these models will create step changes in the value and nature of the AI tools they produce. Inherently though these depend on their own insights into their own thinking processes and becomes a reflection of the human mind and the way evolution has developed cognitive processing to date.

In the future AI's might be able to improve on evolution though I think mostly in terms of speed and accuracy, possibly insight too, though that depends a little on how well we understand ourselves, which is not as well developed as I would like it to be. After millions of years of evolution and thousands of years of human thought IMHO the principles of morality and humanity have arrived at an evolutionarily stable strategy and so common sense and decency will continue to be what they are today and hopefully we will get better at delivering on these principles with the assistance of these tools.

In that endeavour people like Musk and Altman are slightly dangerous double edged swords in that they can push the agenda forwards but they can also pervert it to serve their special interests. However I have little doubt the rest of the human race will provide them with "feedback" and hold them to account.
 
The scaling stuff for LLMs has always been a pipedream pushed by a few. It’s an amazing grift and equally amazing they’ve managed to convince so many it may work. Like @leokitten says more discoveries and technologies are needed and the industry seemed to know that until ChatGPT blew up.
There is lots of work on LLM architectures although it tends to be small adaptations on the basic Decoder based arch. For example, Microsoft have released a model using Mamba that uses recurrent layers to change linearize the computation in the prefill stage. Google have done some interesting stuff with adaptive hidden unit layers allowing the same model to be used with different computational requirements. There is also lots of interesting stuff around KV Caches and how to reduce overheads which again very much reduces the compute requirements but also provides ways to precompute on things coming from vector dbs and avoiding long contexts and hence compute times.
 
There is lots of work on LLM architectures although it tends to be small adaptations on the basic Decoder based arch.
Absolutely. The efficiency improvements have been huge. That small local models can do what huge models used to be needed for and that the big players are pushing on efficiency even on their larger models to reduce costs seems far more significant that any imagined progress towards AGI. But the latter gets headlines and investors I guess.

Have you got any links to the Mamba work? I hadn’t heard about that. I only recently learnt about quantization aware training and had heard about Google’s and some of the KV cache work but a lot of the detail is beyond my understanding tbh.
 
Have you got any links to the Mamba work?
Its on my reading list - they had interesting results I think its significant the Microsoft have picked it up in one of their small models (one Phi-4-mini) although I've not tried that model yet.
 
This is a potentially interesting paper (I've only read the abstract). They seem to be fine tuning LLMs with time series data from various organs and then have an agentic approach to simulating the whole body based on these organ models. Could be an interesting approach to modelling the body as a complex system.

Organ-Agents: Virtual Human PhysiologySimulator via LLMs

This is what I am starting to do now as part of my PhD. Long story short is that so far I’m unimpressed and haven’t seen many examples of insights from an AI model being validated in vivo unless there was already strong reason to think a certain gene/pathway was relevant independently of AI
 
AI's can very very good at reading large amounts of data. That said, the old rule of garbage in garbage out still applies.

A lot of medical research tends to report their results in favor who paid for the research. For instance, a lot of cannabis research comes from the "find why cannabis is bad for you" of the 70's. Newer research tells a much different story.

Medical language also has a difficult relationship with causality.

Most areas of life have "common knowledge" or similar education/indoctrination, so a way of thinking is also established pretty much wherever they go.

It can be difficult to get an AI to go "against the grain" and synthesize a new result.

To get around some of those issues, I've used "research in the last 5 years" or "what is the relationship between x and y". For the relationship questions, using the "deep thinking" mode tends to give better answers and often means the AI will generate a detailed "show your work" report that can be read through carefully.
 
Big oops or Eureka?

I'm a Comp Sci guy. Most of my career has revolved around learning new technologies, particularly in their infancy. I tend to be very aware of their uses and limitations as they evolve. In the last six months I've been using an AI to surface interesting medical research regarding ME/CFS.

Over the many years I've had ME/CFS I've had one oddity than never fit the criteria. For most people, including me, PEM has always been a main debilitating factor. The advice is to do pacing and generally be very careful about physical activity. My version of pacing is one mixed with lots of exercise. Exercise, lie flat, exercise again. For instance I go for a brisk 45 minute walk each morning, lie flat for an hour or two, then go to the store 5 blocks away, walk the two grocery bags home and lie down for another hour or more until the energy returned.

Because I have the more sophisticated AI research plan, I ran my Ancestry DNA file through it and asked for the 5 most problematic SNPs. Note that the resulting genetic data is typically used to determine location rather than being at all useful for medical genetic assays.

In my genetic data I have

rs176027951741645244TC

Which is benign rather than problematic. The first problematic entry the AI produced in the report it made was

RS1760279 AMPD1

Of course I started some research on AMPD1 and found these symptoms. The AI insisted I had the gene from both parents, so worst case.

Exercise Intolerance: The most common symptom is a reduced ability to sustain physical activity.
Fatigue: Experiencing tiredness more quickly and for longer periods after exertion.
Muscle Pain (Myalgia): Aching or tenderness in the muscles, particularly after exercise.
Muscle Cramps: Involuntary muscle contractions, often during or after physical activity.

Briefly, AMPD1 deficiency (from both parents) results in an inability to process CYP3A4, the most common enzyme in most humans. The deficiency often goes unnoticed because there are compensatory mechanisms that tend to hide it. If the compensatory mechanisms are overwhelmed then the symptoms tend to appear. Interestingly, athletes with this deficiency do poorly at most sports that involve sudden spurts of glucose but do much better at sports such as long distance running. The deficiency is also tied to metabolic syndrome and diabetes.

I think of the current AI's as untrustworthy research assistants or law clerks so I quadruple check answers in various ways. I carefully read the research papers it is using and curate the important ones. I run the same facts through different AI's. After all my AMPD1 research I did exactly that, only to discover that the AI had given me answers of what appeared to be a typo (my benign gene had an extra number at the end)

Now I start to think the AI is telling me what I want to hear, but this deserves a closer look. I'd like to understand why the error was made. In general, asking an extremely detailed question gives a reliable answer but if you ask a general question (what is the relationship between ME/CFS and CYP3A4?) there is much more latitude for the AI to come up with more interesting answers, not necessarily what your were looking for.

In the last few months, one of the enhancements for AI is their ability to remember previous conversations. Many of my previous conversations revolve around CYP3A4 because the drug I've been taking for 10+ years completely inhibits CYP3A4.

In essence, it appears that the AI response used not only my DNA data, but also my previous queries to synthesize an answer that has a lot of merit. In this case it looks like I used my CYP3A4 status to imply that I had an acquired AMPD1 issue.

More interestingly, the CYP3A4 gene deficiency has a strong geographic component that reflects my Eastern European heritage. I'm wondering if the AI also took that into consideration when it "misspoke".

Regardless, to me AMPD1 leading to a CYP3A4 deficiency looks like an interesting data point that deserves some research.
 
Back
Top Bottom