Chemical Similarity Enrichment Analysis of Metabolomics and Exposomics Datasets
Why ChemRICH ?
Metabolites are not independent -
Assuming a statistical independence among metabolites is incorrect for metabolomics datasets because of the existence of 1) metabolic pathways 2) same origin 3) genetic regulation of metabolism and 4) chemical similarity among metabolites. Therefore, any p-value adjustment approach to correct for the multiple hypothesis testing at the raw entity level p-values causes false negative results, leading to missed biological insights from metabolomics datasets.
Classical pathway analysis misses new biological insights in metabolomics datasets -
Classical pathway analyses are inappropriate for metabolomics datasets for two major reasons - 1) biochemical databases are incomplete for metabolomics so they are biased towards only a handful for selected compounds 2) a hypergeometric test relies on a background database which do not exist for metabolomics datasets.
Metabolite set definitions should be data-driven and chemistry-driven -
Alternatively, we can define data-driven and chemistry-driven compounds sets. Those sets can be used by a background database independent test such as the Kolmogorov–Smirnov test(KS-Test) to obtain the set-level significance. There could be several ways to define chemical sets including different types of ontologies.
ChemRICH is a chemistry-driven approach -
ChemRICH is a chemoinformatics and statistical approach that defines chemical classes for metabolites and then runs a KS test to obtain the set-level p-values. It uses chemical similarity against the MeSH database to obtain chemical classes, but users can also provide their own chemical classes to run the KS-test. In reference to ChemRICH and the KS test, we are assuming that under a null hypothesis, the p-value distribution for a chemical set will be similar to a uniform distribution (discussion) .
A chemrich Impact Plot
A chemical similarity tree
A new manuscript on ChemRICH 2.0 is coming soon.
For now, please cite -
Barupal, D.K. and Fiehn, O., 2017. Chemical Similarity Enrichment Analysis (ChemRICH) as alternative to biochemical pathway mapping for metabolomic datasets Scientific Report 2017. (link)
Barupal, D. K., Fan, S., & Fiehn, O. (2018). Integrating bioinformatics approaches for a comprehensive interpretation of metabolomics datasets. Current opinion in biotechnology, 54, 1-9. (link)