Compute correlation modules

Detecting correlation modules is a well-known approach in omics bioinformatics to find molecular sets that can be linked with a phenotype (ref, ref). For metabolomics data-sets, these sets can indicate 1) pathways 2) chemical origin 3) similar structure 4) enzyme activity 5) cellular damage 6) response to toxic stresses. Sets are even more interesting because they can prioritize unknown metabolites which correlate in a co-regulatory module that is strongly linked with a phenotype of interest. For instance in the example analysis, the compound "X-14320" is most probably a dipeptide as it correlates strongly with other dipeptide compounds (Figure). These dipeptides are strongly associated with a cancer phenotype. It should be noted that most of these dipeptides are not yet covered in any of the biochemical databases, therefore a classical pathway analysis based approach will completely ignore them, leading to a missed opportunity in gaining new biological insights.

ChemRICH package has a function to compute the correlation modules among metabolites.

Instructions :

Option 1) Using the Google Colab Notebook.

For a regression analysis results, you can use this notebook to get the correlation modules and also the set-level statistics.


Option 2) Running at a local computer

Example data-set for the correlation module detection is available here.

You need to make a new folder "Correlation module detection" or any other preferred name in your computer and then make that folder as a working directory in the RStudio software. Then, download and copy the example dataset to this working directory. After that, run these lines of codes.



chemrich_predict_correlation_modules(inputfile = "chemrich_input_correlation_modules.xlsx")

The output is "correlation_modules_prediction.xlsx". You need to add the pvalue and effect size values in this file, then you can use the updated file as an input for the basic chemrich function. The output should look like the below figure. It is a powerful way to prioritize unknown metabolites in the context of a biological significance.