Statistics Primer (in more detail)

Thank you for the statistical introduction this morning! My background is in bioinformatics and clinical medicine. One of my goals for the workshop is to identify more resources to help “shore up” my understanding of the underlying statistical methods and their pitfalls for genetic analysis. This morning the problem with ratio methods was alluded to. Is there a good resource to dive into this more fully?

thanks,

-Lakshman

here’s a nice paper from David Allison about the challenges with ratios: Statistical considerations regarding the use of ratios to adjust data - PubMed and relatedly, there is this work: Adjusting for heritable covariates can bias effect estimates in genome-wide association studies - PubMed from Pete Kraft and colleagues on collider biases which can also lead to biases

1 Like

Ben- any chance you could post a pdf to the David Allison article? It’s difficult to find.

allison1995.pdf (3.5 MB)

1 Like

As a follow up, in order to gain some level of understanding of the underlying mathematics I was poking around. A general introduction to PCA which may be useful are:

https://www.datacamp.com/tutorial/principal-component-analysis-in-python

While these are ok for me. There are probably more mathematically oriented introductions which might be helpful?

I am sure there are others and PLEASE let me know if there are better ones.

Question: for a two (or three) dimensional array, I assume that PCA will be the linear best fit line?

On a second point, it was a bit unclear to me how/when Fst is used to “correct” for population substructure. Is this preferred to PCA? Any insights would be helpful.