Implementing effective machine learning-based workflows for the analysis of mass spectrometry data

DOI: 10.5584/jiomics.v6i1.196

Authors

  • Hugo López Fernández Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain
  • Miguel Reboiro-Jato Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain
  • José A. Pérez Rodríguez CFR: Centro de Formación e Recursos de Ourense, Rúa Universidade s/n, 32005 Ourense, Spain
  • Florentino Fdez-Riverola Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain
  • Daniel Glez-Peña Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain

Abstract

Mass spectrometry using matrix assisted laser desorption ionization coupled to time of flight analyzers (MALDI-TOF MS) has become very popular during the last decade due to its high speed, sensitivity and robustness for accurately detecting proteins and peptides. This allows quickly analyzing large sets of samples in one single batch and doing high-throughput proteomics. In this scenario, bioinformatics methods and computational tools play a key role in MALDI-TOF MS data analysis, as they are able to correctly handle the large amount of raw data generated with the goal of discovering new knowledge and extracting useful conclusions.

A typical MALDI-TOF MS data analysis workflow consists of three main stages: data acquisition, preprocessing and analysis. Although the most popular use of this technology is to identify proteins through their peptides, analyses that make use of artificial intelligence (AI), machine learning (ML), and statistical methods are of particular interest to conduct biomarker discovery, automatic diagnosis, and knowledge discovery.

In this introductory work, the potential of these techniques are explored and novel solutions based on the application of AI, ML, and statistical methods are reviewed. In addition, an integrated software platform that supports full MALDI-TOF MS data analysis is presented with the goal of facilitating the work of proteomics researchers without advanced bioinformatics skills.

Downloads

Published

2021-03-01