Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes, 2022, Karczewski et al

forestglip

Moderator
Staff member
Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes

Konrad J. Karczewski, 1,2,3,11,13, * Matthew Solomonson, 1,2,11 Katherine R. Chao, 1,2,11 Julia K. Goodrich, 1,2 Grace Tiao, 1,2 Wenhan Lu, 1,2,3 Bridget M. Riley-Gillis, 4 Ellen A. Tsai,5 Hye In Kim, 6 Xiuwen Zheng, 4 Fedik Rahimov, 4 Sahar Esmaeeli, 4 A. Jason Grundstad, 4 Mark Reppell, 4 Jeff Waring, 4 Howard Jacob,4 David Sexton, 5 Paola G. Bronson,5 Xing Chen, 6 Xinli Hu, 6 Jacqueline I. Goldstein, 1,2,3 Daniel King,1,2,3 Christopher Vittal, 1,2,3 Timothy Poterba, 1,2,3 Duncan S. Palmer, 1,2,3 Claire Churchhouse, 1,2,3 Daniel P. Howrigan, 1,2,3 Wei Zhou, 1,2 Nicholas A. Watts,1,2 Kevin Nguyen, 1,2 Huy Nguyen, 1,2 Cara Mason,7 Christopher Farnham,7 Charlotte Tolonen, 7 Laura D. Gauthier, 7 Namrata Gupta, 7 Daniel G. MacArthur, 1,2,9,10 Heidi L. Rehm, 1,2 Cotton Seed,1,2,3 Anthony A. Philippakis, 7 Mark J. Daly,1,2,3,8 J. Wade Davis, 4,12 Heiko Runz, 5,12 Melissa R. Miller,6,12 and Benjamin M. Neale1,2

Published: September 14, 2022

[Line breaks added]


Highlights
• Public release of gene-based association statistics for 4,529 diseases and traits
• Genebass, a browser framework to display rare-variant associations
• Tight coupling between frequency, natural selection, and power for genetic discovery
• Biological signal between SCRIB and white-matter integrity (from MRI)

Summary
Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease.

Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection.

We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

Web | PDF | Cell Genomics | Open Access
 
Not a new paper!

This was a large scale effort to look for rare variant associations in all the phenotypes in the UK Biobank, which includes the "chronic fatigue syndrome" trait.

ME/CFS is never mentioned in the paper, but the results for all phenotypes are freely accessible on a website they made for this called Genebass. For example, here is a link to the CFS page: chronic fatigue syndrome

(Searching for "chronic fatigue syndrome" doesn't really work for getting to that page. You have to search for the code they use for CFS, which is not very straightforward to figure out. There's a phenotype metadata spreadsheet that I uploaded on this post with all the codes. The code for CFS is categorical-20002-both_sexes-1482)

The paper says that these are the p-value thresholds that they used for a 5% false positive rate (line breaks added):
Based on this analysis, for each phenotype, in addition to QC criteria defined below, we consider genome-wide p value thresholds of

2.5 × 10−7 for SKAT-O tests,
6.7 × 10−7 for burden tests, and
8 × 10−9 for single-variant tests

(see Supplemental information and Figure S10), corresponding to approximately 0.05 expected false positives per phenotype.

Looking at the CFS page, none of these thresholds are met. The lowest p-values:
  • SKATO for missense variants in AFG3L2
    • p = 6.3e-7
  • Burden for missense variants in AFG3L2
    • p=1.14e-6
  • Single variant 14-58646014-C-T
    • p=1.29e-6
But this could still be a good reference for checking genes as a followup when there's already reason to believe certain genes might be interesting.
 
Last edited:
Copying the most significant genes from the Genebass page for CFS here so they show up in searches just in case.

These are split up by category of synonymous, missense, or pLoF. There are three types of statistical tests, so three p-values per gene per category. I took the 20 most significant genes based on each type of test and combined them within each variant category.

Gene tests based on synonymous variants:
ARL2-SNX15
ATP5MG
BCL2L2-PABPN1
BORCS6
BRPF1
CAMK2G
CC2D2B
CHTOP
COPZ2
CYP3A7
CYP3A7-CYP3A51P
DUSP16
EIF5A2
GLUL
HAMP
HMG20B
LEMD3
LZTFL1
ME2
OLAH
OR51I1
PLCXD2
PLCZ1
RAB32
RIMS1
SEPTIN3
SHLD2
SKIL
SNX15
SPATA31D4
STAC
VCX
VSIG10
ZNF75A

Gene tests based on missense variants
AFG3L2
AHCTF1
ATP6V1FNB
C18orf54
C2CD2
C2orf15
CADM3
CBR3
CFHR5
COLGALT2
COMMD8
EMC2
FAM47A
GAP43
HNRNPU
ITIH2
KRTAP5-10
LAMTOR2
MED11
NPIPB2
NRAP
PPP1R11
SCO2
SEMA4D
SERPINA4
SHKBP1
SNX11
SPRR2B
TMEM164
TMEM30B
TNFAIP8L1
TRIM6
TRIM6-TRIM34
TUBA1C
UBQLNL
ZNF275
ZNF415

Gene tests based on predicted loss of function (pLoF) variants
APAF1
ATP5MF-PTCD1
DCDC2
DMAP1
FAM160B2
FBXL2
FERMT2
GALNT17
HERC1
HPDL
JAK2
MGLL
MKI67
NDUFB10
NT5C3B
PDK1
PLEKHG4B
PRAP1
PRPSAP2
PTAR1
RADIL
SKA2
SPTLC3
TAF9B
TEAD1
TPR
TRHR
TSG101
TTLL4
ZBTB38
ZNF511-PRAP1
ZNF540
 
Last edited:
Interesting. I haven't looked to see if any of the genes highlighted here match with DecodeME findings.

I can't access the CFS page as it doesn't support mobile browsers. How many CFS cases of the reporting from UKB? There are several different UKB definitions available, I think amounting to over 5000 cases. But most of those probably don't have CFS.
 
I haven't looked to see if any of the genes highlighted here match with DecodeME findings.
I did a bit of a comparison between DecodeME genes and Genebass genes on the DecodeME thread: https://www.s4me.info/threads/initi...2025-decodeme-collaboration.45490/post-651566

I can't access the CFS page as it doesn't support mobile browsers.
If you turn on desktop mode on mobile it works (at least on Android) and works fairly well.

How many CFS cases of the reporting from UKB? There are several different UKB definitions available, I think amounting to over 5000 cases. But most of those probably don't have CFS.
Good question. The phenotype info page for CFS says the following. Why are N cases and N both sexes not the same?
N cases: 1764
N controls: 393019
N cases males: 561
N cases females: 1549
N both sexes: 2168

The phenotype file linked in a post above says this for CFS:
n_cases_defined: 2110
n_cases_both_sexes: 2168
n_cases_females: 1549
n_cases_males: 561

So I'm not really sure how many cases exactly, but somewhere around 2000.

Here is the information about the phenotype from the metadata file:
Description: Non-cancer illness code, self-reported

Description more: Code for non-cancer illness. If the participant was uncertain of the type of illness they had had, then they described it to the interviewer (a trained nurse) who attempted to place it within the coding tree. If the illness could not be located in the coding tree then the interviewer entered a free-text description of it. These free-text descriptions were subsequently examined by a doctor and, where possible, matched to entries in the coding tree. Free-text descriptions which could not be matched with very high probability have been marked as "unclassifiable". Note that myasthenia gravis appears twice (under codes 1260 and 1437). Please ensure you use both codes to capture all relevant diagnoses.

Category: UK Biobank Assessment Centre > Verbal interview > Medical conditions
 
Back
Top