GCTA for big datasets and sequencing data

carmen.ramoser · June 2, 2026, 7:35am

Thank you for the very interesting practical despite the problems.

Loic mentioned that GCTA should not be used for big datasets. I wanted to ask if it is possible to use it for a dataset of 70.000? As I understood the theory, GCTA gives the pedigree heritability as opposed to summary statistics, which will give the SNP heritability. I am interested in this difference. For this reason I also wanted to ask if anybody has experience with running gcta on sequencing data on the UKB RAP. How much computational resources would this take?

Thank you in advance!

maria.koromina · June 2, 2026, 5:59pm

I believe you can also use GCTA (GREML) to estimate SNP heritability but happy to see what people here think too.

l.yengo · June 3, 2026, 2:00am

Thanks for the question, Carmen. I’d clarify that the issue with sample size applies to the GREML method. GCTA also implements Haseman-Elston regression, which scales quite nicely and therefore can be used with N=70k. GCTA can give you both pedigree-based or SNP-based heritability depending on the range of genetic relationships included in your analysis. If you only give it “unrelated” individuals then you’d get SNP-based heritability and if you include closer relatives (e.g., siblings, parent-children pairs) then your estimate will be closer to a pedigree-based method. Methods using GWAS summary statistics would generally only give you an estimate of SNP-based heritability. We used GCTA (or PLINK) to calculate GRMs in one of our recent papers (https://www.nature.com/articles/s41586-025-09720-6). Cost inference depends on how many SNPs and people you include in your analysis. You may need to budget for at least 1000 GBP for data processing.

carmen.ramoser · June 3, 2026, 12:21pm

Dear Loic,

thank you for your answer, this is very valuable to me! Indeed I would like to do something similar as you did in your paper.