Degree of freedom for t-test and interpreting heritability estimate when including PRS as fixed effect

anonymous10 · June 16, 2022, 7:15am

To test the significance of the betas, the df for the t-test is set at 1887. How is this df calculated?
When we included PRS as a fixed effect, is the heritability estimate (V(G)/VP) biased as part of the phenotypic variance is now explained by PRS? Thanks!

conor · June 16, 2022, 11:11am

If you add the PRS of phenotype X as a fixed covariate in the twin model for phenotype X, part of the additive genetic (A) variance is explained by the PRS, the rest of the A variance is estimated as the variance of the latent A variable. So suppose var(A) = 10, var(C)=3 and var(E)=5, var(pheno)= 10+3+5 = 18. Suppose the PRS explains 1/18 of the phenotypic variance, then - I think - the twin model results will be var(A) = 9, var(C)=3 and var(E)=5, and the phenotypic variance explained by the PRS equal to 1. The heritability goes from 10/18 (no prs) to 9/18 (conditional on the prs). Neither of these are biased. Of course the 9/18 is not the “total narrow sense heritability”, because it excludes A variance attributable to the PRS. The 9/18 is the “narrow sense heritability conditional on the PRS”.

The dfs in regression are calculated as follows. Suppose we regress Y (dependent) on P=4 predictors X1 X2 X3 X4 in a sample of N (unrelated) participants. The test of each b parameter (regression coefficient ) is based on a t-test with N - P - 1. The test of the total explained variance (the R^2 statisctic is based on an F test with nom df = P and denom df = N-1-P. Check it out using simulated data in R:

s=matrix(c(.05),5,5)
diag(s)=1
library(MASS)
N=1000
P=4
x=mvrnorm(N, rep(0,5), Sigma=s)

colnames(x)= c(‘y’,‘x1’,‘x2’,‘x3’,‘x4’)
x=as.data.frame(x)
summary(lm(y~x1+x2+x3+x4, data=x))
dfs=N-1-P
tval=summary(lm(y~x1+x2+x3+x4, data=x))$coefficients[3,3]
pt(tval,df=dfs,lower=F)*2

anonymous10 · June 16, 2022, 6:13pm

Thank you for the detailed clarification! I have a follow-up question on the dfs:

For the GREML analyses, the log reported the following descriptions when including PRS:

Performing REML analysis … (Note: may take hours depending on sample size).
1894 observations, 8 fixed effect(s), and 2 variance component(s)(including residual variance).

So there are 7 covariates and 1 intercept. Does this mean the t-test on beta shall be based on 1894-8, and the F test for R^2 shall be 1894-8+1, or will it be different given our sample includes related and unrelated individuals? Thank you again!

baptiste.CD · June 16, 2022, 11:08pm

Conor is right, the degrees of freedom for the t-statistic is df = N-1-P (with N the number of observations, and P the number of parameters in the model).

From the gcta output, 1894-9-1=1884. Note that the variance component takes a degree of freedom too (you can think about it as estimating the variance, i.e. one extra parameter).

So the 1887 used in the script was a mistake, well spotted! This would have deserved a free cake or drink at an in person workshop.

That said, when the df is large (say greater than few hundreds or thousands) a small mistake like this barely changes the p-value.

One last point, the likelihood ratio testing we use in OpenMx (e.g. compare nested models) relies a Chi2 test. Here, we can also use such test (thanks to the fact we have a high number of observations, and asymptotic equivalences of tests). Take (beta/SE)**2 as a test statistic, this follows a Chi2 distribution with 1 degree of freedom. This way you always get the df of the test correct!