Analysis of data from 500k individuals in UK Biobank shows an inherited component to ME/CFS (Ponting blog)

I've come back to this thread after Chris Ponting has posted on the metabolic trap thread a link to anlayse the ME/CFS patient data on the UK Biobank site. The P4HA1 mentioned in this thread seems inconclusive to me as the Biobank data gives a high -log10(P value) for low MAF variant. In this case MAF=0.0001 or 1 in 10,000. So I would think you wouldn't need many people having a variant here to have a good P value....... Unfortunately I don't know how to see or calculate how many people have the variant described in the OP.

I have a question for @Chris Ponting. Have you investigated lower -log10(P value) values to see if anything pops up. For example, chromosome 1, region 210993kb - 213098kb looks interesting
upload_2019-7-26_16-30-53.png
This has a grouping of -log10(pv) of about 6 using the UK Biobank tool with an ME/CFS dataset
Source: http://geneatlas.roslin.ed.ac.uk/re...&minregion=210993&chrom=1&representation=plot

If I look at what genes are nearby Ch1:212000 I see gene LPGAT1.
Source : http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1:211659046-212169045&hgsid=742308547_JJepNQceuWKSNtihJOcVakcaYYcr

From : https://www.ncbi.nlm.nih.gov/gene/9926
This gene encodes a member of the lysophospholipid acyltransferase family. The encoded protein catalyzes the reacylation of lysophosphatidylglycerol to phosphatidylglycerol, a membrane phospholipid that is an important precursor for the synthesis of cardiolipin.

Cardiolipin has been mentioned before in ME/CFS. This study found evelated antibodies "Anticardiolipin antibodies in the sera of patients with diagnosed chronic fatigue syndrome."
https://www.ncbi.nlm.nih.gov/pubmed/19623655/

Is this kind of analysis of the Biobank data valid?

EDIT : Just realised @Hutan had the same observation
Is it possible that a clustered group of not-quite significant mutations might tell us something? E.g. that cluster on Chromosome 1 where each dot falls below the significance line but is well above the frequency of most mutations.
 
Last edited:
What was the prevalence of the P4HA1 SNP in the 1829 people who self-identified with ME/CFS?
Can anyone answer @Lucibee question?
[Note Supplementary table 1 mentioned in Simons blog now lists 2017 people, also confirmed on biobank study reporting website]

With the following data from the Biobank study for this variant is it possible to calculate backwards how many of the 2017 CFS subjects compared to 450247 controls had this variant? I couldn't find info on the Biobank site on how the p values were calculated. Having this number would allow us to better interpret the P4HA1 variant

From : http://geneatlas.roslin.ed.ac.uk/search/?traits=615&variants=rs150954845
Rich (BB code):
Variant     Chr   Position   beta       pv         MAF      
rs150954845 10    74828696   -0.04743   2.59E-12   0.0001062

[source for number of cases/controls = 2017/450247 : http://geneatlas.roslin.ed.ac.uk/trait/?traits=615 ]

Perhaps @Simon M or @Chris Ponting ?
 
Last edited:
Can anyone answer @Lucibee question?
[Note Supplementary table 1 mentioned in Simons blog now lists 2017 people, also confirmed on biobank study reporting website]

With the following data from the Biobank study for this variant is it possible to calculate backwards how many of the 2017 CFS subjects compared to 450247 controls had this variant? I couldn't find info on the Biobank site on how the p values were calculated. Having this number would allow us to better interpret the P4HA1 variant

From : http://geneatlas.roslin.ed.ac.uk/search/?traits=615&variants=rs150954845
Rich (BB code):
Variant     Chr   Position   beta       pv         MAF      
rs150954845 10    74828696   -0.04743   2.59E-12   0.0001062

[source for number of cases/controls = 2017/450247 : http://geneatlas.roslin.ed.ac.uk/trait/?traits=615 ]

Perhaps @Simon M or @Chris Ponting ?

Very few. MAF = minor allele frequency = 0.0001062. This number times 450247 is ~47 who would have *one* of their two copies of this gene with this DNA variant and predicted to be no people with both copies of this gene with this variant.
 
@Chris Ponting Would it be possible for you to enquire whether the 100,000 Genomes project has ME/CFS data available? According to this article today they are now making the data available to researchers. I believe there are three other ME/CFS research teams looking at obtaining WGS data (Camille Birch, UAB, Nancy Klimas NOVA, Fereshteh Jahaniani at Stanford) who would likely be interested in the answer too.
https://www.genomicsengland.co.uk/100000-genomes-for-approved-researchers/
Over 100,000 whole genome sequences now available for approved researchers
Data Release 7 has now gone live in Genomic England’s Research Environment. While every data release is significant in its own right, v.7 is symbolic. It means we have now passed the milestone of 100,000 whole genomes available to researchers.
 
@Chris Ponting Would it be possible for you to enquire whether the 100,000 Genomes project has ME/CFS data available? According to this article today they are now making the data available to researchers. I believe there are three other ME/CFS research teams looking at obtaining WGS data (Camille Birch, UAB, Nancy Klimas NOVA, Fereshteh Jahaniani at Stanford) who would likely be interested in the answer too.
https://www.genomicsengland.co.uk/100000-genomes-for-approved-researchers/

Unfortunately not likely. Participants were chosen according to different diagnoses, but I think it's highly unlikely that pwME were included. Personally, I think whole genome sequence (WGS) data is currently too expensive to do at scale. So the "SNP-chip" genotyping approach (that Nancy Klimas is taking, asking for 23andMe data) provides a better way forward because it provides statistically more robust results.
 
Unfortunately not likely. Participants were chosen according to different diagnoses, but I think it's highly unlikely that pwME were included. Personally, I think whole genome sequence (WGS) data is currently too expensive to do at scale. So the "SNP-chip" genotyping approach (that Nancy Klimas is taking, asking for 23andMe data) provides a better way forward because it provides statistically more robust results.

What approach is going to be taken by the upcoming GWAS for ME?

23andme data is not guaranteed to be accurate... and it doesn't cover a lot of genes.
 
What approach is going to be taken by the upcoming GWAS for ME?

23andme data is not guaranteed to be accurate... and it doesn't cover a lot of genes.

Whole genome genotyping, probably using the Affymetrix UK Biobank Axiom® array. This will survey ~850,000 DNA variants in every person. Yes, it will not survey every one of our 3 billion DNA letters but the ~850,000 variants actually allow many other variants to be guessed accurately ("imputed") - probably about 96 million variants. This is because DNA variants that are located close together on chromosomes are likely to be inherited together as they get passed down the generations, and often this 'linkage' is not broken. The key thing is that this covers the whole genome and allows any variant over about 0.5% minor allele frequency to be tested. So reasonably rare variants will be tested, but not DNA variants that are extremely rare, and perhaps private to individuals and/or their families.
 
Hi Chris, thanks for corresponding.

I was wondering, could linkage be a possible factor in the statistical anomaly? i.e. could this SNP be acting as a marker for a section of chromosome carrying another unidentified gene which is the real factor in predisposing the owner to ME?
 
Hi Chris, thanks for corresponding.

I was wondering, could linkage be a possible factor in the statistical anomaly? i.e. could this SNP be acting as a marker for a section of chromosome carrying another unidentified gene which is the real factor in predisposing the owner to ME?

Yes, the statistically significant association is to the DNA variant (SNP) and not to the gene. But having said that, the DNA variant does also predict the activity of SLC25A15 (ORNT1) - in other words the amount of the protein made by cells - so it is a good candidate for the gene that is influenced by the SNP. The alternative hypothesis is that another gene is influenced by the DNA variant. If so, then this gene would need to be within about one million DNA letters from the SNP. There are now computational approaches - getting better every year - that narrow down which genes are likely to be influenced by the associated SNP. If this association is replicated, then my *guess* is that the influenced gene is SLC25A15 because of the above, but also because the associated SNPs (that are 'linked') encompass SLC25A15, and because mitochondrial dysfunction might be expected as a cause of altered ME/CFS risk.
 
I've come back to this thread after Chris Ponting has posted on the metabolic trap thread a link to anlayse the ME/CFS patient data on the UK Biobank site. The P4HA1 mentioned in this thread seems inconclusive to me as the Biobank data gives a high -log10(P value) for low MAF variant. In this case MAF=0.0001 or 1 in 10,000. So I would think you wouldn't need many people having a variant here to have a good P value....... Unfortunately I don't know how to see or calculate how many people have the variant described in the OP.

I have a question for @Chris Ponting. Have you investigated lower -log10(P value) values to see if anything pops up. For example, chromosome 1, region 210993kb - 213098kb looks interesting
View attachment 7853
This has a grouping of -log10(pv) of about 6 using the UK Biobank tool with an ME/CFS dataset
Source: http://geneatlas.roslin.ed.ac.uk/re...&minregion=210993&chrom=1&representation=plot

If I look at what genes are nearby Ch1:212000 I see gene LPGAT1.
Source : http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1:211659046-212169045&hgsid=742308547_JJepNQceuWKSNtihJOcVakcaYYcr

From : https://www.ncbi.nlm.nih.gov/gene/9926


Cardiolipin has been mentioned before in ME/CFS. This study found evelated antibodies "Anticardiolipin antibodies in the sera of patients with diagnosed chronic fatigue syndrome."
https://www.ncbi.nlm.nih.gov/pubmed/19623655/

Is this kind of analysis of the Biobank data valid?

EDIT : Just realised @Hutan had the same observation

Sorry, hadn't seen this until now. I agree with you, re: P4HA1.
About chromosome 1 and LPGAT1: There is a risk of reading too much into 'peaks' that are not significant overall (over the entire genome). Yes, these *could* be true, but the statistics imply that most of them are false. In GWAS of other diseases/traits, when such "sub-genome-wide significant" results are attempted to be replicated most of them fail to be so. This is why I try to adhere as much as possible to objective statistical measures and only when these appear to be robust do I look, subjectively, for supportive evidence.
But a general answer to your question is: This is why we need large sample sizes and to have a replication study (by others) in order to be convincing. This field is littered with hypotheses that have not (yet) been strongly substantiated and I do not wish to add more.
 
So the "SNP-chip" genotyping approach (that Nancy Klimas is taking, asking for 23andMe data) provides a better way forward because it provides statistically more robust results.
Unfortunately the paper written by the Klimas team using this approach is anything but robust. 23andMe is low quality and the data is full of errors.

Thread on issues with this study on PR with contributions from several different people raising concerns
https://forums.phoenixrising.me/thr...ot-study-perez-et-al-2019.76400/#post-2206507

Quick summary of issues here on s4me
https://www.s4me.info/threads/genet...athanson-klimas-et-al.9415/page-2#post-172724

In simple terms you don't need to be a Scientist to see the issues - they are that blatant. Sorry for stating the blunt truth.
 
Last edited:
Unfortunately the paper written by the Klimas team using this approach is anything but robust. 23andMe is low quality and the data is full of errors.

Thread on issues with this study on PR with contributions from several different people raising concerns
https://forums.phoenixrising.me/thr...ot-study-perez-et-al-2019.76400/#post-2206507

Quick summary of issues here on s4me
https://www.s4me.info/threads/genet...athanson-klimas-et-al.9415/page-2#post-172724

In simple terms you don't need to be a Scientist to see the issues - they are that blatant. Sorry for stating the blunt truth.

This is not a Genome-Wide Association Study (GWAS). It is not genome-wide, it has no appropriate control group, it is not well-powered for its intended aims, and its statistics are inappropriate. So, yes, I agree with your assessment.
 
Back
Top Bottom