Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Discussion in 'ME/CFS research' started by SNT Gatchaman, Apr 17, 2025.

  1. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,058
    Location:
    London, UK
    Depends on the buns and lemonade a bit.
     
    hotblack, Deanne NZ, Kitty and 2 others like this.
  2. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    I'm a bit surprised that you are so positive.

    Their model built on associations of rare gene variants from two sources seemed to perform fairly well when tested with another very small source of ME/CFS genetic data (a testing cohort of 36 cases and 21 controls). But, the 115 genetic variations that they identified as differentiating did not replicate when tested with the UK Biobank data. See my post above.

    Of course, there could be problems with the accuracy of ME/CFS diagnoses in the UK Biobank. But, equally, there may be issues with the selection of the fairly small number of samples used for the Zhang analysis. Maybe a slew of irrelevant gene variants swamped the signal from some relevant ones? The presentation of the data in the UK Biobank comparison didn't get down to the granularity of individual variants.

    It's definitely interesting, but I don't think it is definitive. Very happy to be convinced otherwise, of course.
     
    MeSci, hotblack, Deanne NZ and 6 others like this.
  3. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,058
    Location:
    London, UK
    OK, I am not in a position to judge. @jnmaciuch seems to think the associations must be real. More opinions welcome.
     
    MeSci, hotblack, Deanne NZ and 4 others like this.
  4. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    Like @Hutan I don’t think it’s definitive.

    But given the rarity of their variants and the small sample size, the independent cohort validation actually carries a lot of weight for me.

    If it was all (or even partly) a fluke, you’d expect that test cohort AUC to be barely above 0.5. The validation shows that despite looking at very rare variants in a small group of people, the same pattern of rare variants was replicable in another small group.

    But I’d wait to see the overlap with DecodeME before getting too excited. My concern is not that the results are unreliable, but rather that they’re only a small part of the story.

    [Edit: I honestly don’t know what to make of the lack of replicability in UK BioBank, but like [edit: others] have already alluded to, the strength of replicability in an association such as this relies on whether the endpoint you’re associating with is actually similar between the two studies. It’s more likely for things to get drowned out in BioBank without stringent diagnostic criteria].
     
    Last edited: Apr 18, 2025
    MeSci, hotblack, Deanne NZ and 6 others like this.
  5. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,058
    Location:
    London, UK
    But if I get Hutan's argument right, by the same token there should have been replication in a cohort of maybe 2000 in the UK Biobank?

    Reliability is the issue here. There are bound to be caveats. The UK, Biobank might have been a bad sample, as might the others for various reasons. But if there is a discrepancy that doesn't seem to make statistical sense that is a worry. Or is it that they didn't actually test for replication for ME/CFS? As Hutan implies that would be a bit odd.
     
    hotblack, Deanne NZ, Yann04 and 3 others like this.
  6. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    I agree completely with this.

    I got excited when I read about independent cohort validation. But was disappointed with the UK Biobank failure to replicate - and I was surprised that the authors did not even mention the failure to replicate in the Results. It seemed to be swept under the carpet. The second strongest association with the Zhang 115 gene variants according to the chart was a set of people labelled 'Covid-19 controls'! I don't know if that set includes or excludes people reporting Long Covid.

    I haven't even finished reading the results yet, let alone the Discussion, so perhaps they do talk about it later. But, this illustrates my point that there is probably too much covered in this paper. We need more detail to properly evaluate the findings. The UK Biobank comparison is really a paper in itself, a replication, it should not just be a paragraph and a generalised chart.
     
    Last edited: Apr 18, 2025
    hotblack, Deanne NZ, Yann04 and 4 others like this.
  7. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    I may be misinterpreting the methods (they're a bit vague on details), but I suspect it would be because they are actually comparing p-values between their own cohort and BioBank. Depending on how loosely BioBank defined ME/CFS, it would have a profound difference on how strong the association is with any given gene.

    What (I think) they're doing in the UK BioBank analysis is simply checking whether the highly associated genes in their data set are similarly highly associated (above background) [edit: in published studies for] other conditions. So if the UK BioBank GWAS are providing genes that are more broadly associated with general fatigue due to insufficiently stringent diagnostic criteria, the top genes in Zhang et al. may or may not come up among the top genes in BioBank.

    However, the validation with the Cornell cohort in this paper is not a comparison of p-values after-the-fact, it's actually looking at the participant-gene-level data for the new group and seeing if the same combination of genes is similarly predictive of outcome.

    [Edit: @Hutan, to your point, I think Zhang et al. would be substantially limited by whatever labels were already applied by UK BioBank studies, including that vague 'Covid-19 controls' label. If it was that vague, though, I would've just left it out of the paper.]
     
    Last edited: Apr 18, 2025
    hotblack, Deanne NZ, Yann04 and 2 others like this.
  8. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    I don't have a problem with labels.

    The Figure 5 caption gives this explanation of some of the covid related labels
    I think they could have explained better what the 'Covid19:_C2_v2_England_controls' means though. C2 just means the group of people who got Covid regardless of whether they got it severely, were hospitalised or had a mild infection; perhaps controls means people who had not got Covid, by some particular date? Or perhaps it means people who got Covid but who didn't get it severely? Either way, it's not clear if people with a genetic tendency to get Long Covid were included in the group or excluded.

    That's a very opaque and confusing sentence. I don't think they found that 'ME/CFS' was genetically correlated with anything there - they were testing whether their set of rare genetic variants was genetically correlated with anything. And they seem to be suggesting that their set of 115 variants was found to be correlated with the genetics of people with susceptibility to Covid-19. But, .... their chart seems to indicate that the significant correlation was with a control group.

    It's messy.
     
    hotblack, Deanne NZ, Yann04 and 3 others like this.
  9. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,094
    I think this paper is using their fancy non-linear HEAL2 algorithm to predict disease risk in the first two cohorts, while they had to rely on the traditional statistical tests for the Biobank, so I don't think it should be too concerning that it didn't replicate.

    Apart from that, I'd consider depression being the top disease hit a semi-replication. There's a good chance many people in the ME/CFS cohort of this study and the depression cohort of the UK Biobank have a similar condition. For 15 years I was diagnosed with depression before getting an ME/CFS diagnosis, so that's how I would have signed up, and I'm guessing there are many similar cases of people who think they have depression but actually have ME/CFS. And even if the cohorts are perfectly diagnosed, depression is probably one of the conditions I'd rank in the top three if asked for conditions similar to ME/CFS, so seeing it be the top hit from the same genes as ME/CFS is interesting.
     
    hotblack, Yann04, Kitty and 2 others like this.
  10. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    I think they're specifically saying that their highly associated genes have a greater-than-expected-by-chance overlap with the set of genes that another GWA study had already found to be associated with XYZ condition. As in, I don't think they actually did any direct analysis of the UK BioBank data, which is an important distinction.

    I saw "I think" here because I am truly guessing. I'm having a hard time figuring out exactly what they meant.

    I agree it's extremely messy, though I think part of that is because they could only use the same terms that were defined in other papers. They should have done a much better job clarifying whatever they referenced, though.

    [Edit: cross posted the same thought with @forestglip]
     
    hotblack, Yann04, Kitty and 3 others like this.
  11. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    As far as I can tell, there were separate studies. One compared the prevalence of their identified rare variants against identified rare variant data recorded for UK Biobank groups as per my post#44 above and Figure 5.
    I guess its possible that there was not rare variant association data for ME/CFS in the UK Biobank although I would be surprised, when there appears to be rare variant association data for having had one body part x-rayed. I'm assuming each UK Biobank participant has had their genetics investigated with rare variants noted as well as being given disease and trait labels. And so the UK Biobank database can pull out the significant rare variants for all of the disease and trait labels there are. If there was no UK Biobank ME/CFS rare variant data, it would have been helpful if they had noted that.

    There's a reference there that might help us work out what they did, and I still haven't got to the Methods section.


    Another separate study looked at GWAS studies of diseases and traits. They only found something about sleep duration. They looked at GWAS of covid phenotypes, and that is when one of the Long covid phenotypes was found to be associated. See my post #45 above and Figure 5.
     
    Last edited: Apr 18, 2025
    hotblack, Kitty and Peter Trewhitt like this.
  12. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    I think we're on the same page, some signals are just getting lost in transmission!
     
    hotblack, Yann04, Kitty and 1 other person like this.
  13. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    (I've now skimmed the Methods section of the Zhang paper but it is almost as if they haven't got around to finishing that section. There is very little there about the later studies in the paper including the UK Biobank comparisons.)

    Ref #37 Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes, 2022

    There's a great looking interface to the UK Biobank database that presumably makes assessing genetic relationships easier. From what I can see, genetic variants have been identified for all of the disease and trait labels that had more than 200 cases in the biobank, nearly 5000 labels. AI tells me that more than 1800 people have a CFS label in the biobank. So, I think Zhang et al should have been able to assess the relationship between their identified 115 variants and those of the people with the CFS label.

    From the Supplementary material of Ref 37:
    Screen Shot 2025-04-19 at 10.45.10 am.png

    I think a good question to ask Zhang ey al is did they explore the relationship between their set of variants and the genetic information of people labelled with CFS in the UK Biobank? And, if not, why not.
     
    hotblack, Kitty and Peter Trewhitt like this.
  14. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,058
    Location:
    London, UK
    I am struggling to get my head around this, while trying to write something else, but I wonder if the apparent failure to replicate with the Leeds UK Biobank reflects different methodologies. Do Leeds have a cohort that has had whole genome sequencing and documentation for ME/CFS status. There was the preliminary GWAS study of people who reported having ME/CFS but is that usable? I don't know, but I wonder if there is a reason why they do not say there is a negative result per se.
     
    hotblack, Yann04, Kitty and 1 other person like this.
  15. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    1,085
    Location:
    UK
    I think there is a question mark about UK biobank diagnostic reliability. People were asked either if they had ever had a diagnosis of chronic fatigue syndrome, or Myalgia and celery -itis/ Chronic fatigue syndrome. It’s too easy for people with a diagnosis for chronic fatigue – which is pretty common, to answer to these questions. And Louis Nacul did work in Canada showing this is what happened in a large BC cohort Identified in a general population with abroad question: a more detailed follow-up questionnaire established that many positive answers didn’t have ME.

    DecodeME Is different because not only was that a detailed follow-up questionnaire, but most people were recruited from the ME Community, rather than from the general public. I think it’s unlikely that substantial numbers of people with chronic fatigue are in the ME community. But again, we should soon be able to compare results from UK biobank with DecodeME.

    But the lack of replicability Versus UK biobank doesn’t concern at this stage
     
    Evergreen, hotblack, Kitty and 5 others like this.
  16. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,058
    Location:
    London, UK
    I agree but I am confused because this was a cohort that has been trawled for SNPs as per usual GWAS rather than completely sequenced was it not? And if the Zhang findings depend on rare genes from whole sequencing should we expect a replication to even be possible?

    I would like to think that it is almost certain that Zhang et al. have picked up some genetic signals, even if their approach giving 115 genes Amy be much more difficult to interpret than what is likely to come from DecodeME by tracking commoner SNP variants across big numbers.

    Hutan was concerned that Zhang failed to find they should have found if their data were reliable. I am still unclear whether this is so. Even if the UK Biobank 'ME/CFS' cohort was dilute I would expect with 2000 cases for there to be some degree of agreement.

    I must say that I find the way the Zhang paper is written much less transparent than the way Edinburgh do things.
     
    Hutan, hotblack, Kitty and 3 others like this.
  17. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    636
    Location:
    UK
    Will the underlying dataset from DecodeME have rare variants and other data used here? Or is that not coming until SequenceME?

    Presumably if it is present replication would be fairly straightforward with access to the model (a shame it hasn’t been made available)?

    Or even a wider attempt to take the DecodeME data and rerun the recipe from this paper to train or just finetune the model and validate it on a larger and more consistent dataset?
     
    Deanne NZ, Kitty and Yann04 like this.
  18. Andy

    Andy Retired committee member

    Messages:
    23,739
    Location:
    Hampshire, UK
    No, DecodeME won't have data on rare variants. Genome wide analysis studies such as DecodeME only look at common variants in specific locations on the genome. Whole genome analysis studies, such as this one and the proposed SequenceME, 'reads' the whole genome of each sample, and this is then used to look for rare variants.
     
  19. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    636
    Location:
    UK
    That’s my understanding. It’s not a replication but looking for genetic correlations between ME/CFS and other diseases. They’re saying “okay here are our identified 115 genes, do these pop up for other diseases too”.

    However if the genes are relevant and the people in the (Leeds) UK Biobank actually have ME/CFS you’d assume the genes would also show up in some way no?
    Good questions. Anyone feel comfortable asking them?
     
    Hutan, Deanne NZ, Simon M and 3 others like this.
  20. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    636
    Location:
    UK
    Thanks Andy.
     
    Simon M, Utsikt, Kitty and 3 others like this.

Share This Page