Thank you for the very interesting practical despite the problems.
Loic mentioned that GCTA should not be used for big datasets. I wanted to ask if it is possible to use it for a dataset of 70.000? As I understood the theory, GCTA gives the pedigree heritability as opposed to summary statistics, which will give the SNP heritability. I am interested in this difference. For this reason I also wanted to ask if anybody has experience with running gcta on sequencing data on the UKB RAP. How much computational resources would this take?
Thanks for the question, Carmen. I’d clarify that the issue with sample size applies to the GREML method. GCTA also implements Haseman-Elston regression, which scales quite nicely and therefore can be used with N=70k. GCTA can give you both pedigree-based or SNP-based heritability depending on the range of genetic relationships included in your analysis. If you only give it “unrelated” individuals then you’d get SNP-based heritability and if you include closer relatives (e.g., siblings, parent-children pairs) then your estimate will be closer to a pedigree-based method. Methods using GWAS summary statistics would generally only give you an estimate of SNP-based heritability. We used GCTA (or PLINK) to calculate GRMs in one of our recent papers (https://www.nature.com/articles/s41586-025-09720-6). Cost inference depends on how many SNPs and people you include in your analysis. You may need to budget for at least 1000 GBP for data processing.