typically what proportion of SNPs don’t pass QC?
It depends heavily on your dataset. In older datasets, it’s common to see higher percents. In newer datasets, percents will be lower.
The exception is excluding rare variants. Those haven’t actually “failed” quality control, they may just be too rare to use for your sample size and study design. So you exclude them during QC, but there isn’t actually something wrong with them, nothing failed during genotyping. Newer datasets will tend to have more SNPs excluded for that reason, because newer genotyping technologies can capture more rare variants.
we can impute back SNPs we drop so we prefer to drop SNPs not people