ChemRICH for Correlation Modules
If a dataset only or partially has MS1 features, you can compute the correlation modules since this approach does not need any structure information of the detected peaks. This is probably the easiest, comprehensive and simplest approach to include unknown metabolites into a metabolite set analysis. See the predict correlation modules step on how to generate the correlation set definition.
Example study.
We are going to use a metabolomics dataset generated for 489 cord blood samples from birth-cohorts. Data-matrix, raw data and sample metadata can be downloaded from here. The data matrix has 4712 features (mz, rt pairs). Using the steps in the "Predicting Correlation Modules" we obtained 189 co-regulatory modules for these features. Now, we associated these clusters with the birth-weight outcome using a combination of linear regression and KS-Test using the "ChemRICH for Any Set Definition" step. It yielded 77 clusters that significantly linked with the birth-weight on a set-level FDR cutoff of 0.05. The cluster impact plot shows which sets were positively or negatively associated with the birth-weight and how strongly (Figure below). Now we can see that features in the "Clust_93" and "Clust_77" had the strongest positive association with the birth-weight. Whereas features in cluster 41 and 73 showed strongest negative association. These correlation modules ( co-regulatory sets) based approach highlight way more number of significant features and their sets that were associated with birth weight, than what was originally reported in the paper.
The input for the ChemRICH using correlation modules is available here. Input raw data for the correlation prediction modules step is available here.
Below figures show the metabolite sets that were associated with the birth weight. Of-course, it is a long way ahead to identify these unknowns, but at least we know which cluster to focus on (Cluster_77,93, 14 and so on..)