Nearest gene to GWAS hit

When discussing GWAS hits, the nearest protein coding gene is used as a kind of shorthand functional annotation for any significantly associated loci (if I understand correctly).

I’m wondering how many significant loci have been conclusively functionally linked to their nearest protein coding gene vs further away genes or intergenic regions?

Do we use this framework since it’s most feasible to understand functional implications of variants in a protein coding gene for wet lab biologists, as Benjamin Neale mentioned yesterday? Or does it capture some biological reality?

1 Like

Eric Fauman tweets quite a bit about this - he says 70%. Here is a tweet that goes into detail about it with lots of paper links

Indeed - the nearest gene is often used as a signpost in the genome although conclusive explanation of the biological mechanism of the association are still quite sparse. A few papers have tried to take on the SNP->gene problem including this from Open Targets: An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci - PMC - there is also the PoPS method - that attempts to leverage genome-wide enrichments to prioritize genes at genome-wide significant loci.

In terms of working out what a locus is doing - the FTO/IRX3/IRX5 locus for obesity and type II diabetes is instructive for how hard it could/might be - Extensive pleiotropism and allelic heterogeneity mediate metabolic effects of IRX3 and IRX5 - PubMed are all efforts to characterize how that locus influences the outcome trait

In our recent paper, we compares different approaches to nominate genes for GWAS hits and found the the nearest gene approach had comparable precision and recall to other methods.