Preparing the input file

At the core of the ChemRICH approach is running a KS test using a set definition. The set definition can be a chemical class, a pathway, a subclass, an ontology term, a chemical source, a correlation module etc. These definitions help in interpreting the metabolomics datasets in different angles.

Basically, you do not want to restrict yourself to only one type of a chemical set definition. You should collect as many set-definitions as possible via online tools, databases, experts, collaborations and literature to perform the ChemRICH analysis.

The ChemRICH package provides a function to predict MeSH ontology chemical classes for your chemical list. You can use that function to get things started while you obtain other set definitions. It should be noted that few data providers such as Metabolon Inc provide a nice chemical pathway ontology in their data matrices. That ontology can be used with the ChemRICH scripts as well.

The minimum input for running a ChemRICH analysis needs five columns. You should prepare this file using Microsoft Excel.

1) compound_name

  • Make sure you have unique chemical names.

  • If you want to keep the same name twice for whatever reason, add a suffix like "_1", "_2" to separate them.

  • You must resolve multiple adducts, fragments, derivatization products to their parent compound to avoid incorrect calculations.

  • If you have signals for the same compound from two ionization modes or chromatography methods, make sure to select the unique one.

2) order

  • This column decides how the sets are arranged on the x-axis of the ChemRICH impact plot.

3) pvalue

  • These are RAW pvalues (without any pvalue correction or adjustment).

  • You can use any statistical method that yield pvalue. It can be Student T-Test, Wilcoxon signed-rank test, Linear regression, Logistic regression, Cox models, CLR models, Mixed effect models etc.

4) effect_size

  • If you are using a student t-test or Wilcoxon signed-rank test, use the fold-change as effect size. It can be computed the ratio of median(condition A)/median (condition B). These ratio will always be positive, where values less than 1 will mean that they are lower in condition A compared to the condition B. If ChemRICH code see a negative value, it will assume that the effect size is coming form a regression model.

  • If you are using a regression model, you can use the beta co-efficient as the effect size.

  • The magnitude of effect size is not utilized in the ChemRICH calculations, only the direction of association is used.

5) set

  • Chemical set names for example, chemical classes, pathways, or ontology or any other ways to group compounds.

  • Sets having a frequency of less than 3 will be ignored from the ChemRICH analysis

  • Sets having no compounds with a raw p-value < 0 .10 will be ignored.


Note : Make sure that the column names matches to "compound_name," "pvalue", "effect_size", "set" in this exact order.

Minimum input file

You can download the template from here .

With this minimum input file you can run the basic ChemRICH method. Go to this page.

We have provided several optimized ChemRICH files for different types of metabolomics datasets. Check here.