ACE Analyses of Categorical Variables Generated from a Mixture Model

srw7va · June 7, 2022, 4:18pm

I am wondering if there are recommendations for conducting an ACE analysis on categorical data generated from a mixture model, such as a latent class growth model with individuals falling in Class_1, Class_2, or Class_3. Would it be appropriate to conduct the ACE analysis on the categorical latent class variable assuming that the latent classes are discrete? Or would it be more appropriate to conduct the ACE analysis on the continuous (although likely highly skewed) posterior probabilities?

michael.neale · June 7, 2022, 10:29pm

Lindon Eaves wrote a nice paper on latent class analysis with data from relatives, here: https://vipbg.vcu.edu/vipbg/Articles/behavior-analyzing-1993.pdf - it’s one place to start thinking about twins and LCA.

However, this may not be what you want to do, or not what your scientific question is. The more direct answer is that beyond two latent classes, it can be difficult to rationalize their relative ordering - especially if the response probability plots for the different classes overlap. One might consider a binary-type analysis comparing one class against the others as a binary variable. Beyond that we might even consider using dummy variables for each class (except one as reference) and consider a multivariate analysis - though I don’t really recommend it. Another thing to note is that if the class membership probabilities don’t cross over much, it’s quite possible that there is a continuum (think scores on a latent factor) and that, e.g., a factor model is better for the measures in hand. LCA of data generated from continua (factor models say) tend to show this pattern of parallel response profiles, which indicates that about the best LCA modeling could do was identify subsets of the data higher or lower on this continuum.

Another thing to consider is that any application of LCA is really an extremist interpretation of data. That “conditional independence” assumption within group says that the item data are completely independent of each other, so symptoms etc no longer correlate AT ALL within the classes. Overall, I think that this is at best an iffy assumption that is unlikely to be met. Factor mixture models are a bit superior in this regard, permitting within-class covariance between items.

Yet another thing to worry about is that for statistical rigor, we really want our observations to be independent and identically distributed (i.i.d.). Unfortunately, when individuals are classified (using posterior probabilities of class membership), the confidence that someone belongs in that class and no other can vary quite a bit, and this means that some of the class memberships are measured more accurately than others, which means that there is a violation of the i.i.d. assumption. A consequence is that your replicability may suffer. I have some ideas how to ameliorate this last problem, but I’ve not experimented with them yet. The short version is to weight the class memberships by their probability of being correct. However, it gets messier when you try to do the right thing, which is that we should consider all the possibilities (see “The Unfinished Game” book for an easy intro into probability figured out this way) when you consider a mixture model with weights according to each class a person may be in… That’s not too bad if there are only a few groups, but thinking about twins, we would have to allow each twin to be a member of each class, so there would end up with nClasses^2 components of the mixture model.

OK, that was a lot, but I hope it helps!