ChemRICH for Metabolon's data
Metabolon Inc. provides data with a pathway ontology (sub-pathway and super-pathway) along with a metabolomics dataset. This ontology is a great example of the non-overlapping pathways. The ontology can be used as a set definition in the ChemRICH tool. If you have SMILES codes for the identified metabolites, ChemRICH will arrange the sets by their lipophilicity (polar to non-polar), if not then you will need to provide an order column in the input file. You may find interesting results if you compare the ChemRICH results using Metabolon's pathway ontology and the results using MeSH ontology.
Here is the code to run the ChemRICH analysis for a data-set by Metabolon Inc. Metabolomics data originated from this paper (LINK). Formatted input files are provided here - 1) Metabolon SubPathway Input and 2) MeSH ontology Input. The ChemRICH impact plot is shown in the images below. You need to download both files and copy them into a new folder. Then run the below R scripts from R-Studio.
Instructions
# Load the scripts.
source("https://raw.githubusercontent.com/barupal/ChemRICH/master/chemrich_chemical_classes.R")
source("https://raw.githubusercontent.com/barupal/ChemRICH/master/predict_mesh_chemical_class.R")
load.ChemRICH.Packages()
# Predict MeSH Classes for Metabolon's Dataset
predict_mesh_classes("metabolon_mesh_prediction.xlsx")
#run chemrich
run_chemrich_chemical_classes("chemrich_input_mesh_classes.xlsx") # with MESH Ontology
run_chemrich_chemical_classes("chemrich_input_metabolon_subpathway.xlsx") # with Sub Pathways by Metabolon
As we can see that both ontologies indicated "Dipeptides" being the most significant chemical cluster in this study, but MeSH complemented Metabolon's pathway ontology by highlighting several chemical classes that might improve the biological interpretation in this study. This study highlights that dipeptides are associated with tumor aggressiveness and poor prognosis. Overall, it is recommended to run ChemRICH analysis with the Metabolon's Sub-pathway ontology and with the MeSH ontology to cover both non-overlapping pathways and chemical classes.
Latest datasets by Metabolon can have up to "1750 blood compounds spanning 20 super-pathways, subdivided into 113 sub-pathways" (LINK). Check the supplementary data for this paper. Many of these metabolites (named) do not have PubChem CID or SMILES codes, so we probably need to use both the User provided Set Definition and the Chemical Class based approaches.