Mergeomics: Multidimensional data integration to identify pathogenic perturbations to biological systems

Le Shu, Yuqi Zhao, Zeyneb Kurt, Sean Geoffrey Byars, Taru Tukiainen, Johannes Kettunen, Luz D. Orozco, Matteo Pellegrini, Aldons J. Lusis, Samuli Ripatti, Bin Zhang, Michael Inouye, Ville-Petteri Makinen, Xia Yang

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies. Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package. Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.

LanguageEnglish
Article number874
JournalBMC Genomics
Volume17
Issue number1
DOIs
Publication statusPublished - 4 Nov 2016

Keywords

  • Blood glucose
  • Cholesterol
  • Functional genomics
  • Gene networks
  • Integrative genomics
  • Key drivers
  • Mergeomics
  • Multidimensional data integration

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Shu, L., Zhao, Y., Kurt, Z., Byars, S. G., Tukiainen, T., Kettunen, J., ... Yang, X. (2016). Mergeomics: Multidimensional data integration to identify pathogenic perturbations to biological systems. BMC Genomics, 17(1), [874]. https://doi.org/10.1186/s12864-016-3198-9
Shu, Le ; Zhao, Yuqi ; Kurt, Zeyneb ; Byars, Sean Geoffrey ; Tukiainen, Taru ; Kettunen, Johannes ; Orozco, Luz D. ; Pellegrini, Matteo ; Lusis, Aldons J. ; Ripatti, Samuli ; Zhang, Bin ; Inouye, Michael ; Makinen, Ville-Petteri ; Yang, Xia. / Mergeomics : Multidimensional data integration to identify pathogenic perturbations to biological systems. In: BMC Genomics. 2016 ; Vol. 17, No. 1.
@article{dc8e03fe41504d5aa72b2084a3ac4260,
title = "Mergeomics: Multidimensional data integration to identify pathogenic perturbations to biological systems",
abstract = "Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies. Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package. Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.",
keywords = "Blood glucose, Cholesterol, Functional genomics, Gene networks, Integrative genomics, Key drivers, Mergeomics, Multidimensional data integration",
author = "Le Shu and Yuqi Zhao and Zeyneb Kurt and Byars, {Sean Geoffrey} and Taru Tukiainen and Johannes Kettunen and Orozco, {Luz D.} and Matteo Pellegrini and Lusis, {Aldons J.} and Samuli Ripatti and Bin Zhang and Michael Inouye and Ville-Petteri Makinen and Xia Yang",
year = "2016",
month = "11",
day = "4",
doi = "10.1186/s12864-016-3198-9",
language = "English",
volume = "17",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

Shu, L, Zhao, Y, Kurt, Z, Byars, SG, Tukiainen, T, Kettunen, J, Orozco, LD, Pellegrini, M, Lusis, AJ, Ripatti, S, Zhang, B, Inouye, M, Makinen, V-P & Yang, X 2016, 'Mergeomics: Multidimensional data integration to identify pathogenic perturbations to biological systems', BMC Genomics, vol. 17, no. 1, 874. https://doi.org/10.1186/s12864-016-3198-9

Mergeomics : Multidimensional data integration to identify pathogenic perturbations to biological systems. / Shu, Le; Zhao, Yuqi; Kurt, Zeyneb; Byars, Sean Geoffrey; Tukiainen, Taru; Kettunen, Johannes; Orozco, Luz D.; Pellegrini, Matteo; Lusis, Aldons J.; Ripatti, Samuli; Zhang, Bin; Inouye, Michael; Makinen, Ville-Petteri; Yang, Xia.

In: BMC Genomics, Vol. 17, No. 1, 874, 04.11.2016.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Mergeomics

T2 - BMC Genomics

AU - Shu, Le

AU - Zhao, Yuqi

AU - Kurt, Zeyneb

AU - Byars, Sean Geoffrey

AU - Tukiainen, Taru

AU - Kettunen, Johannes

AU - Orozco, Luz D.

AU - Pellegrini, Matteo

AU - Lusis, Aldons J.

AU - Ripatti, Samuli

AU - Zhang, Bin

AU - Inouye, Michael

AU - Makinen, Ville-Petteri

AU - Yang, Xia

PY - 2016/11/4

Y1 - 2016/11/4

N2 - Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies. Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package. Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.

AB - Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies. Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package. Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.

KW - Blood glucose

KW - Cholesterol

KW - Functional genomics

KW - Gene networks

KW - Integrative genomics

KW - Key drivers

KW - Mergeomics

KW - Multidimensional data integration

UR - http://www.scopus.com/inward/record.url?scp=84994480566&partnerID=8YFLogxK

U2 - 10.1186/s12864-016-3198-9

DO - 10.1186/s12864-016-3198-9

M3 - Article

VL - 17

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 874

ER -