Estimating Prevalence, Demographics and Costs of ME/CFS Using Large Scale Medical Claims Data and Machine Learning, 2018, Valdez, Proskauer et al

Discussion in 'ME/CFS research' started by Trish, Dec 18, 2018.

  1. JaimeS

    JaimeS Senior Member (Voting Rights)

    Messages:
    1,248
    Location:
    Stanford, CA
    We're not looking at age of onset here, we're looking at prevalence. So to me, it absolutely makes sense that as more people are diagnosed and do not die, the incidence should increase with each decade of life.

    We have some evidence of earlier mortality, but it is quite preliminary.
     
    Amw66, rvallee, Snow Leopard and 3 others like this.
  2. Webdog

    Webdog Senior Member (Voting Rights)

    Messages:
    2,265
    Location:
    Holodeck #2
    Just as an aside, the CDC states that ME/CFS is most common in ages 40-60 (Wikipedia also quotes the CDC on this). A source is not given.
    Edit: UpToDate says ME/CFS is primarily a condition of young to middle-aged adults (and cites a couple old Strauss papers). Many physicians read this.
     
    Last edited: Jan 13, 2019
    andypants likes this.
  3. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,563
    Location:
    UK
    ML can be easily fooled by poor quality data. So if you are using a large database like this for ML you really need to do a very careful scrub of the training (and test) data. The ML will simply pick up on trends in the dataset and if they are unreliable then the results will reflect this.

    The other problem is you can end up developing a 'somethings wrong' detector especially where most of the records contain normal. Errors on people with other chronic illness can be downplayed by the large normal population with simple ML measures (accuracy, precision, recall and F1). So care needs to be taken.


    Wouldn't we expect a peak around older people since this is not about first diagnosis but numbers who are ill thus at 60 it would include all those who became ill prior to 60. I'm assuming this because they don't seem to be doing any temporal analysis just taking data over a few years so they wouldn't know when a first diagnosis was made. But I could have missed something as I read it quickly.
     
    rvallee, ukxmrv and JaimeS like this.
  4. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,563
    Location:
    UK

    From this studies perspective could there also be effects on who has insurance at different ages and the way chronic illness may affect coverage and hence inclusion in the dataset. Its not a purely random selection of patients.
     
    JaimeS likes this.
  5. JaimeS

    JaimeS Senior Member (Voting Rights)

    Messages:
    1,248
    Location:
    Stanford, CA
    I too have heard the weird onset divide, where in the UK (and sometimes in older US studies) they'll say 45 is the median age, and then more recent studies say onset tends to be far younger, (late 20s thru late 30s) with a secondary spike in the teens.

    These all appear to refer to onset, however, not incidence -- [Edit: Though @Webdog I also feel unsure about that 40s-60s... I always read it as onset but I could be wrong.]
     
    Last edited: Jan 13, 2019
  6. JaimeS

    JaimeS Senior Member (Voting Rights)

    Messages:
    1,248
    Location:
    Stanford, CA
    Absolutely. These are the ppl rich enough to have insurance and -- probably -- determined enough to keep going back, thru multiple misdiagnoses.
     
    Amw66 and Inara like this.
  7. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,563
    Location:
    UK
    The assessment criteria and how they are operationalized is the thing that really matters rather than the label and I don't see how that is reflected in the dataset. It would have been nice to have seen some analysis around the quality of the data (i.e. more detailed checks) or looking at how different labels came about (different doctors, institutions, guidelines etc).
     
  8. BruceInOz

    BruceInOz Senior Member (Voting Rights)

    Messages:
    414
    Location:
    Tasmania
    But this would only happen if people don't recover. So it could be interpreted as evidence of very low rate of recovery.
     
    Amw66, rvallee, Liessa and 3 others like this.
  9. JaimeS

    JaimeS Senior Member (Voting Rights)

    Messages:
    1,248
    Location:
    Stanford, CA
    True enough!
     
    Trish, BruceInOz and NelliePledge like this.
  10. JaimeS

    JaimeS Senior Member (Voting Rights)

    Messages:
    1,248
    Location:
    Stanford, CA
    I hope there's a follow-up study in that.

    When you think about it, in order to do that, they will have to query thousands of people (and their clinicians). Meanwhile they have interesting data and should publish it in order to get the funding for follow-up with those people.

    If it were me, I'd go for a certain percentage of those people, randomly chosen for follow-up. Getting even 1000 from the US, for example, would really go a long way towards giving us an idea of who was diagnosed by what criteria in which decade, for example, and whether that original diagnosis is in any way related to their symptoms. You could also find out who'd since been diagnosed with something else that could account for chronic fatigue.

    But you couldn't do that kind of thing with a dataset this huge in its entirety, and if it were me, I'd make that a separate grant and a separate paper. For one thing, it will take quite some time to gather that information, because gathering it requires human input on the other end.
     
    andypants likes this.
  11. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,563
    Location:
    UK
    It also depends on the algorithms they use and how robust they are to outliers. I think some of the boosted trees that they use can be sensitive to outliers but it depends on the cost function (if it is a least squared one but I believe a cost function based on huber loss or absolute loss can be used). So there are things that could help if data is mislabelled (not sure what they used).

    But yes one of the issues with big data is always the data quality particularly when data has been added by people or relies on judgement.
     
    JaimeS, andypants, Andy and 1 other person like this.
  12. Snow Leopard

    Snow Leopard Senior Member (Voting Rights)

    Messages:
    3,860
    Location:
    Australia
    That is an excellent point - the graph reflects cumulative incidence combined with lack of recovery. This can be modelled with Dismod etc too.
     
    JaimeS, Webdog, ukxmrv and 2 others like this.
  13. Medfeb

    Medfeb Senior Member (Voting Rights)

    Messages:
    585
    True, its cumulative incidence and lack of recovery. But still, would we really expect prevalence of ME to rise in every 10 year interval and be the highest in those who are 80-89 years old? That would seem to suggest that new cases are arising in the 70-90 year old cohorts and/or people with ME are outliving other causes of death which is increasing the percent of those remaining that have ME. While preliminary, the evidence of early morbidity suggests the answer is not longer life of ME patients

    fped-06-00412-g004.jpg
     
    Last edited: Jan 16, 2019
    Hutan, JaimeS, andypants and 2 others like this.
  14. Andy

    Andy Committee Member

    Messages:
    23,034
    Location:
    Hampshire, UK

Share This Page