MECFS data analysis thread

Discussion in 'Other research methodology topics' started by Murph, Mar 22, 2024.

  1. SNT Gatchaman

    SNT Gatchaman Senior Member (Voting Rights)

    Messages:
    6,109
    Location:
    Aotearoa New Zealand
    forestglip, Murph, tmrw and 8 others like this.
  2. Midnattsol

    Midnattsol Moderator Staff Member

    Messages:
    3,818
    Just like with other omics, it's possible to do targeted metabolomics, looking for specific metabolites. This can be somewhat problematic as some metabolites are then often included in studies because they have previously been found to be significantly different - when this finding may have been spurious. Still the evidence base for the specific metabolites grows, while other metabolites that may have been more important are not studied further (made worse by how some metabolites are easier to isolate and measure than others).

    There are at least two small n studies on metabolomic changes during the menstrual cycle, but they have all the same issues as have already been mentioned, although at least with a cycle you get several measurements from the same person.

    Dietary differences don't have to be as dramatic as going on a keto vs vegan diet, a simple difference in preferred source of protein or cooking fat could be enough to get changes in amino acids and/or lipids. With lipids there is also dietary history as lipids consumed previously are released from adipose tissue and into circulation to be used for energy.

    I'd love a longitudinal design! Preferably together with some way of indicating if in PEM or not. Or other symptoms. I'm not sure my metabolome would necessarily be different from a healthy person's in periods when I feel fine, but when I feel poorly... and what about those periods when one feels fine except when trying to do something (I at least have them, I can feel perfectly fine when for example sitting still, but trying to move or do something cognitively demanding will make me feel awful).

    I'd want a team to be able to come home to participants to take samples, to reduce differences in strain to get to a test site. And participants to be able to say how their level of activity had been in the last couple of days (for example it could be helpful to know if someone had not been able to pace for whatever reason, or felt they were going to get PEM, were in PEM etc). Ideally I'd also want each participant to get a similar type of diet sent to their home to standardise diet, and maybe an activity tracker of some sort... if one only had the resources :p

    But what is possible/optimal for one study set up may not be so for another. And now there was the recent event of a biobank freezer that stopped working destroying so much collected samples I could cry :(
     
    Murph, wigglethemouse, Sean and 5 others like this.
  3. Murph

    Murph Senior Member (Voting Rights)

    Messages:
    161
    Simon M, Sean, chillier and 1 other person like this.
  4. Murph

    Murph Senior Member (Voting Rights)

    Messages:
    161
    1. This is so wonderful to see, thanks heaps for your hard work! my dream with this project was the many impressive people in this community would pitch in and it is starting already!

    2. Is your topline takeaway from this that hanson 2020 and 2022 have no standout areas of agreement? Is that a fair summary? of course there are those metabolites that they agree on and which are significant but just as many that they disagree on which are significant. It would be excellent to see if any *other* studies have looked at those that they do agree on.

    3. would you be willing to share your code? even just crtl+c, ctrl+v dumping it into a comment here would be useful to me!

    4. did you use the D1-PRE data from Hanson 2022? and does it change if you look at the other timepoints?
     
    Kitty, Sean and Peter Trewhitt like this.
  5. Murph

    Murph Senior Member (Voting Rights)

    Messages:
    161
    This next chart compares Hanson 2020 to Naviaux 2017. There's not loads of metabolites in common but the ones I found in common do not show strong agreement. Note this chart has lipids and other molecules all mixed in.

    naviaux vs hanson 2020.jpeg

    here's the code for this one: https://github.com/jasemurphy/mecfs/blob/main/Hanson_2020_vs_Naviaux_2017.R

    I imagine some people may view this analysis as basic but there's a method to the approach. Before we go to complex interrogation of the data it's worth asking it some very simple questions. So far it doesn't seem to have many answers. Showing that fact is a bit like reporting your null results.
     
    Simon M, Kitty, Sean and 2 others like this.
  6. Murph

    Murph Senior Member (Voting Rights)

    Messages:
    161
    Here's another one while I'm on a roll. Naviaux vs Fluge 2021. This is the Fluge paper where they run some unsupervised machine learning to create subsets. This chart has all the subsets bundled together but the next step might be to see how it looks if you plot each separately.

    NaviauxvsFluge2021.jpeg



    code for this one: https://github.com/jasemurphy/mecfs/blob/main/Naviaux2017_vs_Fluge2021.R
     
    forestglip, Simon M, Kitty and 4 others like this.
  7. FMMM1

    FMMM1 Senior Member (Voting Rights)

    Messages:
    2,812
    Jonathan Edwards's post here*, re TPPP1 gene, illustrates(?) a potential scenario i.e. a target gene/pathway is discovered and you run a metabolomics study on that group:
    • untargeted i.e. measure all the metabolites you can (hypothesis free);
    • targeted at e.g. pathway you consider should be relevant (hypothesis driven).
    As per your comment, a hypothesis driven (i.e. targeted) metabolic study might e.g. be able to look at adjusting the procedure to increase sensitivity (detectability) of certain metabolites.

    Still haven't seen a comment re limitation of Metabolon data (1000 metabolites) and how that could be addressed e.g. did Hanson do a study looking for a wider range of metabolites - is that the sort of thing we need?


    *Jonathan - "Who knows? - we would need to find out. Maybe that the critical abnormal process in ME is so hard to observe because it involves supramolecular solid phase changes within brain cells at a level that at present we have no means to observe directly?"
    https://www.s4me.info/threads/genet...022-hajdarevic-et-al.25070/page-3#post-411554
     
    Kitty, Sean and Peter Trewhitt like this.
  8. Murph

    Murph Senior Member (Voting Rights)

    Messages:
    161
    There are some great points here about the limits of knowledge. I want to describe how I see it.

    We know most studies cover only a fraction of the metabolites in the body. Even when thousands are measured it's possible there's a smoking gun that we haven't measured or that science can't yet measure.

    It's possible that the body fluids measured don't contain any evidence of illness at all. or it's possible the evidence is present only under certain conditions.

    Nevertheless it's also possible there is a metabolite or two that will show up as elevated across several studies, hiddden only by statistical noise. That's what i'm trying to sort out by including everything, whether deemed statistically significant in an individual study or not. A perfectly useful answer will be, nope, you can't find anything that stands out at a metabolite level. The result of that might be to change the way scientists approach metabolomics in mecfs.

    After metabolite-level analysis the second-best approach is pathway based analysis.
     
    Kitty, Amw66, Jaybee00 and 1 other person like this.
  9. chillier

    chillier Senior Member (Voting Rights)

    Messages:
    240
    1) Thank you! I'm glad you think so :)

    2) I think that's basically right, just because the ones they agree on are not significant after multiple test correction. I don't think it's necessarily problematic that they disagree on lots of things though - the ones they disagree on that are most significant are all drug metabolites and it's totally fine that those disagree.

    3) Yep, no problem:

    hanson2020.csv is the link you gave https://www.mdpi.com/2218-1989/10/1/34/s1 supplementary 1, saved as a csv. hanson2022_raw2.csv is https://insight.jci.org/articles/view/157621/sd/2 with the 'original' sheet saved as a csv. Then just set the working directory at the top and it should all run without issue hopefully. The plots it generates look slightly different to the ones I posted before, because previously I used the scaled data whereas now I am normalising myself with the raw data - I'm normalising slightly differently (by sample rather than by metabolite).

    4) Yes I'm using D1-PRE, you can change to other time points from one line in the code (in the data preprocessing section).
     
    Last edited: Mar 30, 2024
  10. Murph

    Murph Senior Member (Voting Rights)

    Messages:
    161
    I just want to point out that these kind of -omics studies are still coming! Bergquist and Armstrong are going to drop a really detailed study soon, with LOADS of datapoints taken at 20 minute intervals and a couple of thousand metabolites and lipids. I'm keen to wrestle the existing data into a format where we can see more easily whether the upcoming findings confirm prior signals, upset the apple cart or simply reaffirm that the whole field is too noisy.


    Screenshot 2024-04-05 at 2.02.33 pm.png

    Screenshot 2024-04-05 at 2.01.53 pm.png



    Some of the findings in the video look quite good I will say.
     
  11. wigglethemouse

    wigglethemouse Senior Member (Voting Rights)

    Messages:
    1,041
    It seemed from the Berquist presentation where those slides come from (Lisbon Conference Apr 4th 2024) that they really wanted to dive into the Lipids to get a better understanding about what exactly is happening. Low ceramides showed yet again!. I think he said that he was putting together an even more detailed metabolomic analysis that will have even more lipid identification.

    What was interesting is that there are two arms to the study. Berquist used exercise, social activity, and a mental task to track how metabolites change over 8 hours and Armstrong will have a set of 3 controlled meals to see how diet affects the metabolites. These are in home studies to remove the stress of traveling and being at a research lab.
     
    Simon M, Murph, Sean and 4 others like this.
  12. wigglethemouse

    wigglethemouse Senior Member (Voting Rights)

    Messages:
    1,041
    mariovitali, Simon M, Murph and 4 others like this.
  13. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,246
    Cool idea @Murph!

    I just yesterday started something in the same vein. Complement keeps popping up, so I thought it could be useful to look at every study that has tested any complement proteins, compile them into one dataset, and visualize which ones are consistently higher or lower across studies. Here's some of the studies I've looked at so far:
    upload_2024-12-29_7-57-57.png

    Then I'll make each protein have its own column and each study have its own row. The cell where they intersect will be dark red if it was increased, dark blue if it was decreased, light red if it increased but wasn't significant, same with light blue, and grey if not significant and no details on which direction. That way you could look down a column and see if it's consistently blue or red.

    It'd probably be more interesting to compare actual values, but that's probably a lot more work, so I'll start here.

    And maybe after this I'll try to do the same with all proteins from proteomics studies.

    Though some people here have made good points. One thing that is particularly worrying for a hope of finding anything with this approach is whether the thing "to be found" is affected by exertion, since ME/CFS is so closely related to exertion. If it's higher than controls when rested and lower than controls after exertion, it'll be completely missed here if different study populations had recently done different levels of exertion.

    But I think there is still a chance there's some consistent factor that is always higher or always lower in ME/CFS.
     
    Murph, Simon M, Kitty and 2 others like this.
  14. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,246
    So I think it makes more sense to just go all in on all proteins instead of just complement then coming back to everything later. I'd have to go over pretty much every paper twice.

    So here's what I've got of my half-thought out plan so far.

    I got all the findings I could from this study so far: Preprint Role of the complement system in Long COVID, 2024, Farztdinov, Scheibenbogen et al.

    The supplementary files don't have the data I need, so I had to just read it off Figure 1. I just copied the volcano plots for the four cohorts they studied into an image editor and marked the proteins as I logged the data.
    upload_2024-12-29_12-8-22.png

    If they were significantly decreased they got -2, non-significantly decreased -1, non-significantly increased 1, significantly increased 2. 0 if not significant and they don't say which direction. (And 3 if not significant and I can't tell which direction. In figure 1D, I can't tell whether some of the circles are red or blue.

    And so this is kind of what I hope to create with many studies. This is all from the above study, but they looked at four cohorts. (The second is just a subset of the first.)


    I'm sure it'll be quite annoying trying to deal with files of ~1000 proteins for some studies. Will try to automate where I can.
     
    Last edited: Dec 30, 2024
    mariovitali, Murph, Amw66 and 4 others like this.
  15. mariovitali

    mariovitali Senior Member (Voting Rights)

    Messages:
    534
    @forestglip Would it be possible to share this work via Google sheets or something similar? Let me know if I can help on extracting this information
     
    Last edited: Dec 30, 2024
    Yann04 and Peter Trewhitt like this.
  16. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,246
    Sure yeah that'd be awesome. I can DM later to share access.

    I'm still not completely sure what I want to do. It'd be nice to create a huge dataset with fold change and significance values for every protein from lots of studies, to do more in depth analysis like what Murph and chillier were posting. But lots of studies don't have that data, they just discuss a few proteins in the text and don't give actual numbers.

    So maybe for every study, trying to write down everything that's significant as a binary like above (significant or not, up or down), for the more simple analysis. And if a study has it, adding FC and p values to maybe do something else with eventually.
     
  17. mariovitali

    mariovitali Senior Member (Voting Rights)

    Messages:
    534
    @forestglip Thank you ! What I am looking for is to end up with the data you described and feed them to my software framework. What I am hoping to do is to generate a pathway analysis that will help us identify the bigger picture. An example is this thread which was made possible by analysing information using various conceptual levels :

    https://twitter.com/user/status/1863564685510861030
     
    Sean, Simon M, Yann04 and 1 other person like this.
  18. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,246
    Sure, I mainly just want to make a large dataset with as many studies as possible so it can be shared and analyzed by anyone in whatever way they want, including the method you said.
     

Share This Page