Analysis of data from 500k individuals in UK Biobank shows an inherited component to ME/CFS (Ponting blog)

Jonathan Edwards · Jun 12, 2018

Hutan said:
I'm with you on this @Andy.
It just seems a very definitive statement to make on the basis of less than definitive evidence.

I think the statement is based on other research - probably studies of monozygotic and dizygotic twins, which can be used to give an approximate estimate of heritability. Twins are assumed all to share environmental factors but monozygotic twins share more genetic material.

Jonathan Edwards · Jun 12, 2018

Hutan said:
Is it possible that a clustered group of not-quite significant mutations might tell us something? E.g. that cluster on Chromosome 1 where each dot falls below the significance line but is well above the frequency of most mutations.
View attachment 3285

Does the UK ME/CFS Biobank have genetic information for its donors?
Does 23&Me cover the collagen mutation?

Absolutely possible. Rather than doing a huge study next I would pick the ten most correlated mutations, regardless of whether or not they reach significance in this study, and go to the DNA we have from the ME Biobank and see if they are confirmed. If you only test 10 mutations rather than thousands the chances of getting a significant finding are vastly greater.

Trish · Jun 12, 2018

There seem to be 4 possible routes to approaching genetic studies in ME/CFS.

1. Piggyback on an existing population study not focusing on ME like this one, using data already collected, but taking the huge risk that the data is so contaminated with misdiagnosed cases as to be meaningless.

2. Set up a huge (MEGA style) UK study with 10,000 participants properly diagnosed and, once the data has been collected, do GWAS in the hope of teasing out patterns and subgroups, which might, but is not guaranteed to, find something that is useful in a next stage of seeking treatments that use that knowledge.
This would be enormously expensive and likely to take years just to collect the samples. And has the added problem that there are few doctors in the UK reliably differentiating genuine ME/CFS from chronic tiredness, and PEM from post exertional fatigue, so misdiagnosis would still potentially be a problem.

3. A world wide collaboration of whole genome studies between research groups and biobanks that already have well characterised samples, and for which some genetic testing has already been carried out. This would have the advantage of spreading the cost between countries and speeding up the data collection by using existing samples and data.

4. Focus on the genetic findings from smaller studies already happening and see if they replicate in other available samples. This would presumably be the cheapest, because a limited number if mutations would be tested for, and the quickest because the samples are already available in biobanks.

For patients' sake, surely options 3 and 4 are the most productive and time efficient. I'm not against large scale GWAS as a long term aim, but in the meantime, I'd hate to see every other ME/CFS study in the UK not able to get funding because whatever funding is achieved is poured into such an expensive long term project.

Inara · Jun 12, 2018

Hutan said:
Does 23&Me cover the collagen mutation?

Dante Labs should because they promised WGS. But I have checked and the P4HA1 gene is missing (too). I have requested it. If I get the result, I could post it here.

Andy · Jun 12, 2018

Me again, with more, probably, dumb questions.

Chris Ponting said:
Second, this part of the protein is not conserved across evolution. There is even a nematode worm known that has a valine at exactly the position (124) that would be predicted to alter risk for ME in humans. This isn’t conclusive, but an amino acid change at a position that is shared across different species would have given us greater confidence in the prediction.

So, I understand "conserved across evolution" to mean the same, or similar, sections of genetic information that are to be found in different species, implying that natural selection replicates them, essentially, because they are really useful in some way, and therefore confer an advantage to the various species that have it over the ones that don't.

What I'm struggling with here is why the fact that we don't see this change in other species, other than the nematode, is seen as casting doubt on the result. IF this change increases the chance of the species carrying it to develop ME, then that is a disadvantage to the species, and is therefore likely to be lost through natural selection, surely? To my simplistic logic, I would have said it's to be expected that it's not found more, if it does indeed increase the chance of ME? The exception might be if it also, at the same time, gives some sort of advantage - in the same way as the mutation that can cause sickle cell anemia also confers immunity, or partial immunity, to malaria.

Simon M · Jun 12, 2018

chrisb said:
Has there ever been a meaningful study of the male to female ratio under the different criteria?

What makes me ask is that I was recently reading a paper by Pelosi and Lawrie of epidemiological findings around Glasgow . I think it was 1994. it suggested equal numbers of males and females affected (which seems out of line with the modern view). Does a lot depend on the criteria, or are there other reasons for the different findings?

I always take a look at the male/female split in a study and I can't remember one with decent diagnosis that wasn't c75% female. Having said that, many with flaky diagnosis aren't so different (eg PACE using Oxford criteria).

It might, in part, be due to doctor's looking for it more in female patients as @Jonathan Edwards suggested. But the same 75% emerges in unbiased prevalence studies (that screen widely), though the numbers are small in that. Same too for patients in the US private clinics, who self-refer, I think - so not dependent on doctor bias.

Andy said:
IF this change increases the chance of the species carrying it to develop ME, then that is a disadvantage to the species, and is therefore likely to be lost through natural selection, surely?

As the blog says, it's a very rare SNP in the human population (just less rare in mecfs patients). The idea is that this might just be random variation.

Andy said:
So the assumption that it is heritable is because it's a DNA sequence, and therefore passed down from our parents.

Not sure that's an assumption, as such! But the key point was that for most data, such as metabolomics, it's always hard to distinguish causes of the disease from effects of it (hence the need for sick as well as healthy controls). However, with the exception of cancer, diseases can't cause change is DNA.

That doesn't mean the DNA changes "cause" the disease, as such, more that they can contribute to pathology. E.g. if several DNA changes cluster on a particular metabolic pathway that would indicate a causal role for the pathway - though disease might only result if someone also had a particular infection and possibly other environmental factors too.

Jonathan Edwards said:
Absolutely possible. Rather than doing a huge study next I would pick the ten most correlated mutations, regardless of whether or not they reach significance in this study, and go to the DNA we have from the ME Biobank and see if they are confirmed. If you only test 10 mutations rather than thousands the chances of getting a significant finding are vastly greater.

That's the basic principle of GWAS: a discovery phase followed by a validation phase that focuses on the most promising findings from the discovery phase. But only of significant findings. The problem with GWAS is they include vast numbers of SNPs and so there is a huge problem with false positives. Or put another way, you would expect a lot of "nearly significant" findings by chance alone. So the current individual findings are likely to be less reliable (I think the overall findings of heritability is more reliable) - so attempts at replication might be futile.

What's really needed is a much bigger and better GWAS to identify the right candidates to take forward to the validation phase.

Trish said:
There seem to be 4 possible routes to approaching genetic studies in ME/CFS.

[my summary]
1. Piggyback on non-mecfs biobank
2. Large GWAS
3. Worldwide collab of existing mecfs gene studies
4. See findings from existing genetic studies replicate

For patients' sake, surely options 3 and 4 are the most productive and time efficient. I'm not against large scale GWAS as a long term aim, but in the meantime, I'd hate to see every other ME/CFS study in the UK not able to get funding because whatever funding is achieved is poured into such an expensive long term project.

The problem here is that 3 is nowhere near big enough to deliver robust results (though might contribute to a bigger GWAS). 4 is largely pointless because of the previous point: it's likely to be chasing noise and a negative result wouldn't mean much.

Also, it doesn't have to be a zero sum game. It's not as if there has been any biomedical funding at scale in the UK. And even if a GWAS were to happen, it doesn't mean other studies would lose out. Though I don't think that this thread is not the place to have a MEGA-style debate: nothing has been proposed yet.

Lucibee said:
Am I correct in thinking that a SNP is not necessarily a disease-causing mutation and could equally be silent, but could also indicate the possibility of further variation in that section of the genome (linkage disequilibrium)?

In general, yes, the SNP is not precise but highlights a stretch of DNA in linkage disequilibrium. However, I think Chris said (to me) that the collagen hit wasn't near to other coding regions in linkage disequilibrium.

And now I need to take a break!

ukxmrv · Jun 12, 2018

Robert 1973 said:
1) I’ve just had a quick look at the UK Biobank website but I couldn’t see how patients were recruited. As many severe ME patients are bedridden without access to medical services, is it likely that people with severe ME are underrepresented in the biobank samples?

That's why it would be interesting to see this repeated in the UK CFS ME biobank as this contains severe patients as they have been going to people's homes to take bloods.

Andy · Jun 12, 2018

Thanks for the responses.

Simon M said:
As the blog says, it's a very rare SNP in the human population (just less rare in mecfs patients). The idea is that this might just be random variation.

To this layman, the blog needs to say that second part more clearly. The title of the blog says "..demonstrates an inherited component to ME/CFS". Whereas the blog states "The analysis identifies one, and only one, DNA position whose genetic variation associates with (in part) ME/CFS susceptibility." which I take to be the indication that only some of the self reported ME/CFS patients have it (bolding and underline mine).

So going back to this quote

Second, this part of the protein is not conserved across evolution. There is even a nematode worm known that has a valine at exactly the position (124) that would be predicted to alter risk for ME in humans. This isn’t conclusive, but an amino acid change at a position that is shared across different species would have given us greater confidence in the prediction.

would the layman interpretation be "We don't see this genetic change replicated in any other species, with the exception of a nematode worm, and which could therefore mean that this result could just be random variation in the population."

Simon M said:
Not sure that's an assumption, as such! But the key point was that for most data, such as metabolomics, it's always hard to distinguish causes of the disease from effects of it (hence the need for sick as well as healthy controls). However, with the exception of cancer, diseases can't cause change is DNA.

That doesn't mean the DNA changes "cause" the disease, as such, more that they can contribute to pathology. E.g. if several DNA changes cluster on a particular metabolic pathway that would indicate a causal role for the pathway - though disease might only result if someone also had a particular infection and possibly other environmental factors too.

The assumption that I'm referring to is that the results "..demonstrates an inherited component to ME/CFS" - given that you've just said it could be just random variation that does seem to make it an assumption.

Jonathan Edwards · Jun 12, 2018

Simon M said:
What's really needed is a much bigger and better GWAS to identify the right candidates to take forward to the validation phase.

I don't really buy this, Simon. Obviously the bigger any study the better but I see no problem with making use of the current study as a primer to look at replication in relatively small cohorts. The 'significance' of findings on the GWAS is not really relevant since it is an artificial construct based on the number of questions asked. If we believe that there are some genetic links then whether or not they cross this arbitrary threshold it makes sense to take the ten most suggestive results and repeat them on another cohort - which could be the ME Biobank 200 cases. If not replicated there then we can be pretty sure that these loci are not really interesting in terms of robust genetic links relevant to a good proportion of cases - like DR4 for RA. DR4 for RA comes up in every population ever studied but I suspect it might not have come up as significant on an initial GWAS trawl. Any linked genes of interest will be sitting there in this initial trawl showing higher rates in ME because 500 is enough to guarantee a result.

If your first ten mutations/allotypes draw a blank on a known cohort then it would be reasonable to try another ten on another cohort. I suspect there is banked DNA from half a dozen cohorts worldwide.

At the same time I think it would be worth collecting more ME cases, but properly documented ones. That is what the ME Biobank have been trying to do and asking funding for for some years. What they have done so far seems exemplary. The obvious thing is to give them the go ahead to expand their project tenfold. But that is still only 2000 cases. Going beyond that seems to me to be unrealistic if we cannot even get that far with proper methodology. Decent cohort collection is time consuming and expensive.

Trish · Jun 12, 2018

Simon M said:
Though I don't think that this thread is not the place to have a MEGA-style debate: nothing has been proposed yet.

I agree, but the subject is inevitably raised when we are discussing an article by the vice chair of the CMRC who has just said on Twitter today in the discussion of this study in response to a question about doing a GWAS:

We are planning 10k-20k cases. It is interesting how the UK Biobank resource might/should trigger new studies on difficult, unexplained disorders.

This sounds to me very much like MEGA is still on the cards.

Jonathan Edwards said:
At the same time I think it would be worth collecting more ME cases, but properly documented ones. That is what the ME Biobank have been trying to do and asking funding for for some years. What they have done so far seems exemplary. The obvious thing is to give them the go ahead to expand their project tenfold. But that is still only 2000 cases. Going beyond that seems to me to be unrealistic if we cannot even get that far with proper methodology. Decent cohort collection is time consuming and expensive.

This is my preferred route. I see no point in reinventing the wheel when there is an exemplary biobank already up and running.

Andy · Jun 12, 2018

Trish said:
This sounds to me very much like MEGA is still on the cards.

Maybe but MEGA without any BPSer influence is a lot more palatable to me.

The biggest issue anyway will be how to get to that kind of number of samples - a question that still hasn't been solved since MEGA was first proposed. 10k self-reported ME patients I would have thought is do-able with the help of the MEA and AfME, the problem would come if/when you actually start to want to properly screen them for any specific criteria. I can't see that there is the required infrastructure and sufficiently knowledgeable, and motivated, clinicians in the UK to get us to that sort of figure.

Jonathan Edwards · Jun 12, 2018

Andy said:
10k self-reported ME patients I would have thought is do-able with the help of the MEA and AfME, the problem would come if/when you actually start to want to properly screen them for any specific criteria.

I actually think the problem is more serious than this, @Andy.
It may sound crazy but recruiting people who join support groups may just end up identifying a genetic risk factor for joining support groups. You then spend five million pounds studying something that has nothing to do with ME. Spurious associations of sorts one might not dream of play a huge part in cohorts in medical clinics. For a genetic risk study to work it is crucial that no confounding associations are 'bred in' to the cohort from the start.

Simon M · Jun 12, 2018

Andy said:
The assumption that I'm referring to is that the results "..demonstrates an inherited component to ME/CFS" - given that you've just said it could be just random variation that does seem to make it an assumption

Yes, that one hit might be, but
"(4) ME/CFS has a biological component because the heritability of ME/CFS is not zero. Canela-Xandri et al. estimate that the genetic heritability (liability scale) is 0.080. "

I'm pretty sure that heritability isn't just calculated from the one significant hit (where random variation might explain the finding).

As the blog is at pains to stress, we need a better study. As Chris put it:

Trish said:
This sounds to me very much like MEGA is still on the cards.

Could we talk about GWAS, not MEGA? MEGA was a particular project led by Esther Crawley and much of the politics around it was linked to that. Plus MEGA was a firm project going for funding. This is currently just an ambition. Let's see what happens. Methodology has yet to be determined.

And this thread is about the findings of the biobank study. If people want to discuss the possible new study, can someone start a new thread for that?

Jonathan Edwards said:
he 'significance' of findings on the GWAS is not really relevant since it is an artificial construct based on the number of questions asked. If we believe that there are some genetic links then whether or not they cross this arbitrary threshold it makes sense to take the ten most suggestive results and repeat them on another cohort - which could be the ME Biobank 200 cases. If not replicated there then we can be pretty sure that these loci are not really interesting

Not according to my understanding. That would apply if there were a few relatively large-effect genes, which seems to apply to your example.

If, however, there were many small-effect genes, where multiple ones highlight a significant pathway, they may well not show up, and data from a small study for non-significant ones may be highly misleading. Just like non-signficant findings in any study.

Don't forget that a random dataset generates a uniform set of p values - with plenty of very small ones by chance in a very large dataset. Which creates a real danger of chasing noise from "interesting" but non-significant findings. At least that's what I learned from my biostats courses looking at very large biological datasets (which were as much fun as they sound).

Jonathan Edwards said:
I actually think the problem is more serious than this, @Andy.
It may sound crazy but recruiting people who join support groups may just end up identifying a genetic risk factor for joining support groups.

I agree that there are huge issues around how to recruit for any GWAS. But as far as I know, it would be very hard to get the bulk of samples from support groups, but any study would need to find a way to reach more widely into the community. Though, equally, I think that backing from support groups is critical, as @Andy says.

Worth noting that most other biomedical studies have similarly skewed samples for various reasons (all the biobank severe patients, for a start).

Trish · Jun 12, 2018

Simon M said:
Could we talk about GWAS, not MEGA? MEGA was a particular project led by Esther Crawley and much of the politics around it was linked to that. Plus MEGA was a firm project going for funding. This is currently just an ambition. Let's see what happens. Methodology has yet to be determined.

Fair point. I apologise.

wastwater · Jun 12, 2018

Anyone know the locations on chromosome 1 and names of genes

Simon M · Jun 12, 2018

Trish said:
Fair point. I apologise.

Thanks, and also I see that I went on to talk about the potential new study that I didn’t want discussed here!

However, Chris just made a comment to me that seems relevant here. I think he was talking more generally, but it seems especially appropriate here:

Chris Ponting said:
it would be terrible if there were to be competition among the so very few scientists interested in pursuing ME/CFS mechanisms.

Chris has a very collaborative approach and I know that he is involved in many collaborations in his own field (that’s important in a lot of big genetic studies). His new PhD-student project on TCRs only came about because he was part of a large consortium based at the Wellcome Sanger trust. Two key people from that consortium have joined him in the TCR project.

So I think that if the GWAS does become a firm proposal then it will be handled in a very different way to what we saw last time.

Jonathan Edwards · Jun 12, 2018

Simon M said:
If, however, there were many small-effect genes, where multiple ones highlight a significant pathway, they may well not show up, and data from a small study for non-significant ones may be highly misleading. Just like non-signficant findings in any study.

I agree that for multiple small effect genes these might not make it to the ten best bets. However, I am a bit sceptical about lots of small effect genes pointing up a pathway. The genes that have helped us in rheumatic disease have been rather large effect ones. Maybe there are examples for small effect genes in other areas.

Where we see several mutations pointing to a pathway in rheumatic disease in some cases these are very rare mutations. They may have huge effects - like ~100% risk of lupus for complement defects - but only account for a small proportion of overall cases. These may of course be missed simply because they are so rare, but that is obviously a different issue.

I don't really see why confirmatory studies on smallish groups should be misleading. By definition they will only be applied to mutations that did show up, at least as best bets, on the first-look sample. The problem with first-look samples is that there will be some rogue results that come out 'significant' by fluke. The chances of a replication test on a second smallish sample being rogues of that type are much smaller if you scale down to a few targets. The chances of a spurious ratio coming out closely the same a second time will be tiny.

Robert 1973 · Jun 12, 2018

Chris Ponting said:
The prevalence of ME/CFS among UK Biobank individuals was 0.448%. In other words, picking any person randomly in the UK then there is an even chance that they know someone with ME/CFS if they know about 200 people.

Would it not be an approximately even chance if you know about 100 people?

Simon M · Jun 16, 2018

Robert 1973 said:
Some questions:

1) I’ve just had a quick look at the UK Biobank website but I couldn’t see how patients were recruited. As many severe ME patients are bedridden without access to medical services, is it likely that people with severe ME are underrepresented in the biobank samples?

2) I don’t understand the second of five reasons to be cautious (b). It’s not conserved across species but there is a nematode with it. Can someone try to explain this to me? [Apologies if I’m being stupid.]

3) Is the estimate that a GWAS with 10-20 thousand cases would be necessary to obtain robust indications based on the assumption that ME/CFS is a single disease? If ME/CFS included x number of different diseases, would it be necessary to include 10-20x cases in order to obtain robust indications?

Answers from Chris Ponting:

1. People recruited: 500,000 of whom over 2,000 people were self-reporting as having been diagnosed with ME/CFS. Yes, those who were housebound are clearly underrepresented.

2. Yes, [as per Jonathan Edwards explanation], it's less likely to be a critical amino acid because [so many different species get by with different versions] of this.

3. Genetic contributions to ME/CFS will be many, and the Genome-wide association study will find each of these that it has the power to find. ME/CFS is defined clinically, whereas what we are discussing here is different: separate genetic contributions to disease. [Please don't ask me to explain this]

This answer is from me:

Robert 1973 said:
Would it not be an approximately even chance if you know about 100 people?

You would think so, given the prevalence of 0.445%, but if that was the case then anyone who knew 300 people would have a greater than 100% chance of knowing someone with ME.

The way to do it is to first calculate the chance of not knowing ANYONE with mecfs out of 200 people:
= 99.55%^200 (where ^ is to the power of i.e. multiplied by itself, e.g. 4^2 is 16)
= 41%
So the chance of knowing at least one person with mecfs out of 200 is:
100% - 41% = 59%, or roughly an even chance.

Robert 1973 · Jun 16, 2018

Simon M said:
Answers from Chris Ponting:

3. Genetic contributions to ME/CFS will be many, and the Genome-wide association study will find each of these that it has the power to find. ME/CFS is defined clinically, whereas what we are discussing here is different: separate genetic contributions to disease. [Please don't ask me to explain this]

That sounds very like a tautology to me. The UK Biobank study found what it had the power to find, as would any study with any number of participants.

I’m not sure I understand the second bit. Are we not discussing separate genetic contributions to a disease? Let’s assume 5% of the 0.445% (for example, the very severe) have a different disease with a completely different genetic contribution. Would a 10-20k participant study be likely to have the power to identify the genetic contributions to this separate disease? Or would we need 200-400k participants for that?

Apologies if I’m asking what Chris asked me not to ask. Maybe someone else can explain.

Simon M said:
You would think so, given the prevalence of 0.445%, but if that was the case then anyone who knew 300 people would have a greater than 100% chance of knowing someone with ME.

The way to do it is to first calculate the chance of not knowing ANYONEwith mecfs out of 200 people:
= 99.55%^200 (where ^ is to the power of i.e. multiplied by itself, e.g. 4^2 is 16)
= 41%
So the chance of knowing at least one person with mecfs out of 200 is 100% - 41% = 59%, or roughly one an even chance.

I thought I was wrong but I couldn’t remember why. Thanks for the clear explanation.

And thanks to you both for answering.

Analysis of data from 500k individuals in UK Biobank shows an inherited component to ME/CFS (Ponting blog)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)