Can ChatGPT produce plausible sounding hypotheses from random input?

Hoopoe

Senior Member (Voting Rights)
After using ChatGPT for a while, I've been impressed with how useful it is for some things (like math questions, or summarising information). Of course, I also asked it about hypotheses based on the genes highlighted by DecodeME and that's where I started having the feeling that it was probably capable of generating superficially credible hypotheses out of almost anything. I decided to test this.

I used https://molbiotools.com/randomgenesetgenerator.php to generate a short list of random genes, then told ChatGPT these genes were from a study of ME/CFS and that I wanted to generate an hypothesis based on these genes, and that I was interested in lipid metabolism and PEM (the first two things that came to my mind). This was the answer (I left out the last part about suggested experiments and sources):
Nice — good gene list. Below I’ll translate those genes into concrete, testable hypotheses that link lipid metabolism to post-exertional malaise (PEM) in ME/CFS, then give specific experiments/analyses you can run next. I’ll call out the handful of high-impact (load-bearing) facts with literature pointers.




Short summary of spotlighted gene functions (one line each)​


  • PEX16 — peroxisome biogenesis protein; required for peroxisome membrane assembly (directly relevant to peroxisomal β-oxidation of very-long-chain fatty acids). UniProtgenecards.org
  • IFNA1 — type-I interferon (IFN-α); chronic IFN signaling reprograms cellular metabolism and can suppress de novo lipid synthesis while altering lipid uptake/handling. PMC+1
  • PKIA — potent inhibitor of PKA (cAMP-dependent protein kinase); PKA activity controls lipolysis, mitochondrial phosphorylation cascades and many metabolic responses. UniProtCenabiotech
  • CCPG1 — ER-phagy cargo receptor linking ER stress to selective autophagy; ER/ER-phagy influences lipid handling, lipoprotein processing and organelle quality control. PMCPubMed
  • CYTH4 (cytohesin-4) — ARF-GEF family member involved in vesicle trafficking; localizes to lipid droplets and membranes — plausible role in lipid droplet dynamics and membrane trafficking of lipids. proteinatlas.orggenecards.org
  • HLA-G — immunomodulatory non-classical MHC I; can shape chronic immune tone and tolerance (could modulate inflammatory drivers of metabolic change). PMCFrontiers
  • PIWIL4, CASC2, LINC01132, TTTY3 — (PIWIL4 piwi family; CASC2 & LINC01132 lncRNAs; TTTY3 Y-linked lncRNA). These are regulators (epigenetic, post-transcriptional, microRNA sponge roles) that could affect expression of metabolic genes or immune regulators. UniProtPMCBioMed CentralCenabiotech



Mechanistic hypotheses (ranked & testable)​


Below each hypothesis I list why it’s plausible and how it links to lipid metabolism → PEM.


1)​


  • Why plausible: PEX16 is required for peroxisome biogenesis; defective peroxisomes reduce peroxisomal β-oxidation of very-long-chain fatty acids and plasmalogen synthesis (membrane lipids). Loss of peroxisomal capacity can cause accumulation of unusual lipids and reduce substrate supply to mitochondria under high-demand (exercise) states. UniProtgenecards.org
  • PEM link: After exertion, tissues rely more on fatty acid oxidation. If peroxisomal upstream processing is impaired, fuel flux to mitochondria is disrupted, causing delayed recovery, metabolite accumulation (lipotoxic intermediates) and immune activation → PEM.

2)​


  • Why plausible: Type-I IFNs downregulate de-novo lipid/cholesterol synthesis and alter lipid uptake profiles; chronic IFN exposure can change immune cell and tissue metabolism. PMC+1
  • PEM link: IFN-driven metabolic shifts may blunt flexible lipid utilization during exertion and provoke pro-inflammatory lipid mediators (eicosanoids), worsening post-exertional symptoms.

3)​


  • Why plausible: PKIA inhibits PKA. PKA activation (via cAMP) promotes lipolysis (hormone-sensitive lipase), mitochondrial responses and adrenergic metabolic adaptation. If PKIA is increased, the normal cAMP/PKA response to exertion may be blunted. UniProtCenabiotech
  • PEM link: During/after exertion the inability to mobilize fatty acids quickly could cause early energy deficit, increased reliance on anaerobic glycolysis, and later malaise.

4)​


  • Why plausible: CCPG1 mediates ER-selective autophagy and helps maintain ER proteostasis; ER dysfunction impacts lipid biosynthesis and VLDL assembly and can secondarily impair mitochondrial function. PMCPubMed
  • PEM link: With exertional stress, increased protein/lipid folding demands or impaired ER turnover could tip cellular energetics toward failure and trigger systemic symptoms.

5)​


  • Why plausible: CYTH4 is an ARF-GEF family member; proteomic/localization data show association with lipid droplets and vesicle compartments — it could regulate lipid droplet dynamics and trafficking of lipid transporters. proteinatlas.orggenecards.org
  • PEM link: Poor mobilization or aberrant sequestration of lipids could limit fuel availability during exercise and promote inflammatory lipid signaling afterwards.

6)​


  • Why plausible: HLA-G can dampen or reshape immune responses; lncRNAs (CASC2, LINC01132) are known regulators of signaling pathways and may modulate metabolic gene expression or microRNA networks that control lipid enzymes. TTTY3 (Y-linked lncRNA) is less characterized but could have sex-specific regulatory effects. PMC+1BioMed CentralCenabiotech
  • PEM link: Combinations of immune tolerance/activation and regulatory lncRNA changes could determine whether metabolic hits translate into symptomatic PEM.



Integrated (multi-hit) hypothesis — short​


A multi-hit model where peroxisomal insufficiency (PEX16) + altered signaling (PKIA, CYTH4, CCPG1) + chronic immune reprogramming (IFNA1, HLA-G, lncRNAs) converge to produce impaired lipid mobilization/oxidation and accumulation of lipid intermediates, which causes an energetic crisis and immune activation after exertion → PEM.

You decide whether this is superficially credible.

In the human body, everything is connected to a lot of other things, so it's easy to find connections between things.
 
Last edited:
Absolutely this. These tools can create plausible sounding results from anything.

They’re great when used responsibly, for anything which is verifiable, as they can be trained and trained to ensure they behave. That’s why they often make good information retrieval, summarising, code completion/assistant and mathematics solution tools.

But for anything speculative? Absolute garbage. And often dangerously so.

One great writer, particularly if you have a technical leaning, which I think you do, is https://simonwillison.net/

Edit: this also reminds me of one great use, creating dummy data for testing systems, if you need to populate a database or something, give it the structure and get an LLM to do it!
 
They’re great when used responsibly, for anything which is verifiable, as they can be trained and trained to ensure they behave. That’s why they often make good information retrieval, summarising, code completion/assistant and mathematics solution tools.
Yes, they're very good at these things.

It's just like Wikipedia was always good when it came to uncontroversial topics that had definitive answers, while being unreliable and inevitably biased when it came to other things.
 
It’s strength tends to be knowledge mass, and linking knowledge together.

Not reasoning or critically assessing knowledge.

If you’re stuck on something and are good with prompts, it can help you see different ways of looking at things and make new connections. Potentially valuable for genetic data. But this needs
(1) Strong verification
(2) The expectation that 29 outputs out of 30 will be useless or illogical.
 
Yes, they're very good at these things.

It's just like Wikipedia was always good when it came to uncontroversial topics that had definitive answers, while being unreliable and inevitably biased when it came to other things.
Yeah, and in turn LLMs are good at reading wikipedia. Meta-pedia? I guess that’s another name for Llama!
 
I was always suprised at how bad it was at finding sources even if you told it the exact name and address of the source it still managed to halicunate an entirely different source that was in some way "sensible". I'm sure that's the easiest thing to fix and probably should sensibly be even fixed for public users quite soon. I'm guessing the top verification methods are already far ahead of what people see when they use ChatGPT.
 
The more I look at wikipedia the more I think anything that isn’t a “big” article reads as if some biased guy wrote his opinion and added a bunch of references to make it look smart.

That might just be a reflection of the type of wikipedia articles I read though.
 
I was always suprised at how bad it was at finding sources even if you told it the exact name and address of the source it still managed to halicunate an entirely different source that was in some way "sensible". I'm sure that's the easiest thing to fix and probably should sensibly be even fixed for public users quite soon. I'm guessing the top verification methods are already far ahead of what people see when they use ChatGPT.

I think it’s got a lot to do with copyright and which websites block bots trying to access
 
I think it’s got a lot to do with copyright and which websites block bots trying to access
I'm not too sure about that. The sources can come from the same public domain (say a public preprint repository) and sometimes it does it, some other times it'll just repetivitely hallucinate "sensibly". I'd place larger bets on it being an LLM property when run with insufficient verification.
 
I'm not too sure about that. The sources can come from the same public domain (say a public preprint repository) and sometimes it does it, some other times it'll just repetivitely hallucinate "sensibly". I'd place larger bets on it being an LLM property when run with insufficient verification.
It could also be rate limits. Like that website allows 50 requests per IP address every 5 minutes. And chatgpt servers run out until it resets. (I’m pretty sure this is common defense again DDOS attacks).
 
That's probably because 90% of texts and papers on ME/CFS talk about weak data and all these possible connections that have no sound basis.

I wonder if you would get better results if you call it illnessX and describe it yourself with what are considered the most robust findings. You could also ask it to be very skeptical, to check if a connection makes sense and that you want to avoid making an error at all costs.
 
That's probably because 90% of texts and papers on ME/CFS talk about weak data and all these possible connections that have no sound basis.

I wonder if you would get better results if you call it illnessX and describe it yourself with what are considered the most robust findings. You could also ask it to be very skeptical, to check if a connection makes sense and that you want to avoid making an error at all costs.

Yes, I got much better results after insisting that it limit its answers to only the most robust findings, be highly skeptical, and avoid overinterpretation. This was in response to a demand for a hypothesis about type-1 diabetes and some other randomly selected genes.

It gave me this:

Got it — let’s strip this right back to what can actually be said with high confidence from the literature and your GWAS list, without speculative functional chains.


Here’s a skeptical, minimal interpretation:




1. Which genes have established, reproducible links to immune function or autoimmunity?​


  • ETV6 – Strong evidence from hematology: encodes a transcriptional repressor essential for hematopoiesis; germline and somatic variants cause altered blood cell development and immune phenotypes. No direct, replicated T1D link yet, but immune relevance is well-established.
  • THRB – Encodes thyroid hormone receptor β. Thyroid hormones affect immune system function, and autoimmune thyroid disease often co-occurs with T1D, but there is no direct mechanistic evidence tying THRB variants to β-cell autoimmunity.



2. Which genes have plausible cellular roles but little to no direct autoimmune/T1D evidence?​


  • MTCL1 – Microtubule crosslinking factor; required for microtubule organization in polarized cells. Known in neuronal and epithelial contexts; β-cell or immune-specific roles not well validated in humans.
  • TBC1D19 – Predicted Rab GTPase-activating protein; general role in vesicle trafficking is plausible, but no confirmed function in immune cells or β cells in the context of autoimmunity.
  • STYXL1 – Catalytically inactive phosphatase; some evidence for involvement in stress responses in cell models, but no robust immune or diabetes link.
  • ERICD, ZBTB40-IT1 – Long noncoding RNAs; regulatory potential is possible, but functional and disease-relevant roles are unconfirmed.
  • OR1F12 – Olfactory receptor pseudogene or low-expression GPCR; no validated function outside olfactory tissue in the context of immunity or diabetes.



3. Mechanistic inferences we can make with caution​


From this set, the only well-supported connection to an immune-mediated disease pathway is via ETV6 (hematopoietic regulation) and, more indirectly, THRB (thyroid autoimmunity co-morbidity patterns). The rest are largely uncharacterized in T1D or immune biology, and functional connections would need to be demonstrated from scratch.




4. Minimal working hypothesis​


Certain loci in the GWAS may influence T1D risk through effects on immune cell development or function (ETV6), while others may reflect novel, as-yet-uncharacterized mechanisms. Most genes here lack direct experimental evidence in β-cell biology or immunity, so their relevance remains speculative.


This seems a lot more useful, but maybe still overoptimistic.
 
I suppose you could say they’re a bit like seagulls. Good at regurgitation but you wouldn’t depend on them for reliable deduction.

Well crafted promote definitely help, out of the box their system prompts can give a tendency to be sycophantic, although there are some recent examples of Gemini being more like Marvin the Paranoid Android.

I’ve found hem incredibly useful tools for many things but as @Yann04 says, not for reasoning or critically assessing information or anything speculative. None of the LLMs seem capable of knowing when they’re wrong. And I do worry when I see how some (individuals or companies and governments) use them.
 
Back
Top Bottom