RiaBIO

Bienvenidos a RiaBIO

La Red Iberoamericana de Inteligencia Artificial para Big Biodata (RiaBio) es una red financiada por CYTED (Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo) que tiene como objetivo central la creación de un consorcio de grupos de investigación y entidades iberoamericanas (públicas y privadas) que colaboran y trabajan juntos durante cuatro años para afrontar rápidamente los desafíos y oportunidades emergentes de la creciente fusión de innovaciones en los campos de la Inteligencia Artificial (IA) y la ciencia de los datos en Biología (BioData Science).

213
Researchers and professors
10
Latin American countries

Jornadas IBEROAMERICANAS RIABIO 2022

Saturday, November 5th
8:45-9:00
RIABIO Session Welcome
Room: UNAM
Format: Live from venue

  • Javier De Las Rivas anad Elizabeth Tapia


One of the pillars of precision medicine is found in the association of genomic structure and variation with the environment and phenotype. Ongoing studies are mainly exploring deleterious variation in gene encoding, and the possible association of epigenetic factors with certain traits. However, large projects about human genetic variation have shown a significant need to define logical and quantifiable patterns on the organization of genetic information in the context of its constituent elements, which go beyond the polymorphisms; that is, elements such as structural variations (SV), DNA repetitions, binding sites to ncRNA/TF, among others, must be considered and integrated on the phenotypic traits of interest. Most of the genomic associations described to date correspond to genes or are dependent on highly heterogeneous events between human populations, that is, gene-dependent genotype-phenotype associations have been essential to recognize direct markers of diagnosis, prognosis, and treatment of different diseases, mainly cancer; but they have also shown great limitations in the study of complex or even rare diseases where the exploration of non-coding regions is necessary. Therefore, the articulation of the largest number of genomic elements in clear structural patterns and with significant function or impact is a major challenge in human genomics and precision medicine. Consequently, our group has been working in the last decade on the exploration of key genomic elements for the understanding of non-coding sequences, genome organization, the association of gene variation, and for the recognition of these elements in the susceptibility to different diseases. Our work has focused on the description of the human genome as a highly entropic, configurable system with measurable emergent properties; Specifically, the organization and configuration of DNA repeats has been one of our great contributions to the understanding and application of the human genome, because we have presented evidence of how configurations between Transposable Repeats are significant for cell function and genomic susceptibility to some neurodegenerative diseases or even cancer. In consequence, the present work will present our project: implementation and evaluation of a predictive model of genomic association based on configurations of DNA repeats and structural variants for the study of rare diseases, which is currently part of the research initiatives on precision medicine in the Colombian Government.

9:00-9:15
Implementation and evaluation of a predictive genome association model for rare diseases based on configuration of DNA repeats and structural variants.
Room: UNAM
Format: Live from venue

Moderator(s): Javier De Las Rivas, Elizabeth Tapia

  • Fabian Tobar-Tosse, Pontificia Universidad Javeriana Cali, Colombia
  • Elizabeth Londoño, Pontificia Universidad Javeriana Cali, Colombia
  • Andres Zuñiga, Pontificia Universidad Javeriana Cali, Colombia
  • Jose Guillermo Ortega, Pontificia Universidad Javeriana Cali, Colombia
  • Valentina Corchuelo, Pontificia Universidad Javeriana Cali, Colombia
  • Patricia E. Velez, Universidad del Cauca, Colombia
  • Pedro A. Moreno, Universidad del Valle, Colombia


A significant fact in precision medicine is the association of the genomic structure and variation with the environment and phenotype. Ongoing studies are mainly exploring deleterious variation in gene encoding, and the possible association of epigenetic factors with certain traits. However, large projects about human genetic variation have shown a significant need to define logical and quantifiable patterns on the organization of genetic information in the context of its constituent elements, which go beyond the polymorphisms; that is, elements such as structural variations (SV), Repeats, Binding-sites to ncRNA or Regulatory Proteins, among others, must be considered and integrated on the phenotypic traits of interest. Most of the genomic associations described to date correspond to genes or are dependent on highly heterogeneous events between human populations, that is, genotype-phenotype associations based on genes have been essential to recognize direct markers of diagnosis, prognosis, and treatment of different diseases, mainly cancer; but they have also shown great limitations in the study of complex or even rare diseases, where the exploration of non-coding regions is necessary. Therefore, the integration of the largest number of genomic elements in clear structural patterns and with significant functional meaning is a major challenge in human genomics and precision medicine. Consequently, our group has been working in the last decade on the exploration of key genomic elements for the understanding of non-coding sequences, genome organization, the association of gene variation, and for the recognition structural patterns in the susceptibility to different diseases. Our work has focused on the description of the human genome as a highly entropic and configurable system with measurable emergent properties. Specifically, the organization and configuration of DNA repeats has been one of our great contributions to the understanding and application of the human genome, because we have presented evidence of how configurations between Transposable elements and other Repeats are significant for cell function and genomic susceptibility to some neurodegenerative diseases or even cancer. In consequence, the present work will present our project: implementation and evaluation of a predictive model of genomic association based on configurations of DNA repeats and structural variants for the study of rare diseases, which is currently part of the research initiatives on precision medicine in the Colombian Government.

9:15-9:30
Identifying potential driver genes by multi-omics approaches: a deep insight into the complex heterogeneity of cancer diseases
Room: UNAM
Format: Live from venue

Moderator(s): Javier De Las Rivas, Elizabeth Tapia

  • Katia Avina-Padilla, University of Illinois, United States
  • Carla Angulo-Rojo, CIASAP UAS, Mexico
  • Octavio Zambada Moreno, Cinvestav IPN Unidad Irapuato - Irapuato Leon, Mexico
  • Jose Antonio Ramirez-Rafael, Cinvestav IPN Unidad Irapuato - Irapuato Leon, Mexico
  • Maribel Hernandez-Rosales, Cinvestav IPN Unidad Irapuato - Irapuato Leon, Mexico


Cancer is a complex disease that relies on progressive uncontrolled cell division linked with multiple dysfunctional biological processes. Oncology practice has incorporated genes in key molecular events that drive tumorigenesis as biomarkers to guide diagnosis and design patient therapy. However, tumor heterogeneity remains the most challenging feature in diagnosing and treating cancer diseases. In this context, we focus on studying the significant heterogeneity of aggressive tumors at the genomics, transcriptional, and interactome levels, emphasizing the potential clinical of driver genes. For this purpose, we have used integrated data from multiple databases and developed bioinformatics strategies and network approaches to contribute to understanding the biological and molecular processes underlying cancer initiation and progression. Our studies have analyzed prevalent cancers, such as Breast Invasive Carcinoma (BRCA), Colon Adenocarcinoma (COAD), Lung Adenocarcinoma (LUAD), Prostate Adenocarcinoma (PRAD), along with the four most aggressive with high intrinsic heterogeneity, namely Bladder Urothelial Carcinoma (BLCA), Esophageal Carcinoma (ESCA), Glioblastoma Multiforme (GBM), and Kidney Renal Clear Cell Carcinoma (KIRC). As a result, we have identified transcriptional profile patterns in GBM fitting with the stem cell model of ontogenesis. A unique distribution of somatic mutations was found for young and adult populations, particularly for DNA repair and chromatin remodeling genes. Our results also revealed that highly lesioned genes undergo differential regulation with biological pathways for young patients. Moreover, we also detected a combination of 4 biomarkers with potential relevance to determine the GBM molecular subtype. Also, we have highlighted the potential regulatory role of differentially expressed (DE) human intronless genes across cancer types. As well as their implication in specific PPI networks for GBM, ESCA, and LUAD tumors. The aim is to identify their unique expression profiles and interactome that may act as functional signatures across eight different cancers. We identified 940 protein-coding IGs in the human genome, of which about 35% were differentially expressed across the analyzed cancer datasets. Specifically, 78% of DE-IGs underwent transcriptional reprogramming with elevated expression in tumor cells. Remarkably, in all the studied tumors, a highly conserved induction of a group of deacetylase-histones located in a region of chromosome 6 enriched in chromatin condensation processes. IGs are essential in the tumor phenotype at transcriptional and post-transcriptional levels. Notably in important mechanisms such as interactomics rewiring in BRCA. Our multi-omic approaches could help delineate future strategies for using the predictive molecular markers for clinical decision-making in the medical routine.

9:30-9:45
Identification of cancer-drug-gene resistance network modules by analysis of genome-wide expression and drug activity correlations in cancer cells
Room: UNAM
Format: Live-stream

Moderator(s): Javier De Las Rivas, Elizabeth Tapia

  • Monica M Arroyo, Department of Chemitry, Pontifical Catholic University of Puerto Rico (PUCPR), Puerto Rico, Puerto Rico
  • Alberto Berral-Gonzalez, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain
  • Diego Alonso-Lopez, Cancer Research Center (IBMCC, CSIC/USAL), CSIC and University of Salamanca, Salamanca, Spain
  • Jose M Sanchez-Santos, Department of Statistics, University of Salamanca (USAL) and Cancer Research Center (IBMCC, CSIC/USAL), Spain
  • Javier De Las Rivas, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain


Drug resistance is a major hurdle in the treatment of cancer patients. Several molecular mechanisms have been identified that contribute to drug resistance and disease relapse, threatening cancer theraphies, patient healing and survival. Multiple drug resistance limits treatment options and patient outcome. This leads to chemotherapeutic options that may have more serious side effects, less effective therapies, or no treatment alternatives at all. We examined publicly available databases (which provide omics information on anticancer drugs activity) and developed an updated version of a web-based cancer drug resistance resource, which provides bipartite drug-protein networks and allows the identification of clusters or modules of resistance (with the identification of putative protein-coding genes that may be involved in the resistance process). In this framework, we calculated Pearson and Spearman correlations between 733 cancer genes and 24,360 drugs. Upon filtering for significant negative correlations, which indicate resistance, and FDA-approved drugs, we identified 1552 resistant pairs between 137 drugs and 374 genes. Heatmaps were generated to establish resistance clusters by cell lines of different tissues, as well as gene-drug bipartite resistance networks. The results showed the identification of known resistance gene-drug pairs. We also found new plausible resistant gene-cancer FDA-approved drug pairs and genes that may be involves in multi-drug resistance (MDR).

9:45-10:00
Application of bioinformatic methods for the deconvolution of cell mixtures to blood and immune cell-types and comparison of generated gene signatures
Room: UNAM
Format: Live-stream

Moderator(s): Javier De Las Rivas, Elizabeth Tapia

  • Natalia Alonso-Moreda, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain
  • Alberto Berral-Gonzalez, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain
  • Enrique De La Rosa, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain
  • Jose M Sanchez-Santos, Department of Statistics, University of Salamanca (USAL) and Cancer Research Center (IBMCC, CSIC/USAL), Spain
  • Javier De Las Rivas, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain


Recent omic studies of the variability between different cell populations provide a deep identification of the activity of cell specific genes and determine how the changes produced in complex cell mixtures are driven by different genes in each specific cell-type. Despite this accurate identification of genome-wide changes provided by omic technologies, the studies on cell mixtures or on bulk samples with different cell-types only provide average global signatures. In the last decade, computational techniques have been developed to solve this problem by applying Cell Deconvolution methods, which are designed to decompose a cell mixture (consisting of various cell-types working together in a tissue, organ or biological system), into its component cells and calculating the proportion corresponding to each cell-type. Some of these methods only calculate the proportions of cell-types in the mixture (supervised methods), while other deconvolution algorithms can also identify gene expression signatures specific for each cell-type (unsupervised methods). In this work, five deconvolution methods (DECONICA, LINSEED, CIBERSORT, FARDEEP and ABIS) were implemented and compared with the aim of evaluating their accuracy and determining the best performance in the identification of different blood and immune cell-types. To asses these methods, we used several bulk expression datasets from peripheral blood samples obtained using both high density microarray data and RNA sequencing (RNA-seq) data. The analyses showed that FARDEEP, CIBERSORT and LINSEED provided a more accurate estimate of the abundance of different cell populations in the mixture. Our comparative analysis also showed that the most efficient algorithm to identify gene signatures is LINSEED.

10:00-10:15
Assignment of gene cell markers to hematological and immune cell-types based on single-cell proteo-transcriptomic data using machine learning approaches
Room: UNAM
Format: Live-stream

Moderator(s): Javier De Las Rivas, Elizabeth Tapia

  • Enrique De La Rosa, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain
  • Elena Sanchez-Luis, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain
  • Natalia Alonso-Moreda, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain
  • Jose M Sanchez-Santos, Department of Statistics, University of Salamanca (USAL) and Cancer Research Center (IBMCC, CSIC/USAL), Spain
  • Javier De Las Rivas, Cancer Research Center (IBMCC, CSIC/USAL & IBSAL), CSIC and University of Salamanca, Salamanca, Spain


The human hematopoietic system is composed of highly specialized cells with unique and essential functions. Due to the relevance of its activity, we have studied the cell types and subtypes that form this complex system using single-cell RNA-seq (scRNA-seq) technique, to unravel the different and heterogeneous cell populations. Specifically, we looked for gene expression signatures that would allow us to identify specific cell types robustly and at different levels (i.e., markers for all cells, markers for specific lineages, for cell types or cell subtypes). Different immune cells are often used for different studies, so their specific isolation is crucial. In our search, we attempted to look for CD marker-based gene signatures first, then genes encoding membrane proteins, and lastly any other protein-encoding gene. In the methodological part of this work, we first tried to determine the best computational workflow to analyse large scRNA-seq datasets. Once an analytical workflow was set up, we used it for determining different gene signatures of bone marrow (BM) and peripheral blood (PB) cells based on single-cell resolution power. We analysed three different scRNA-seq datasets, and we detected a reference list of 369 human CDs in the genes provided (19,813, 33,660 and 36,601 transcripts, respectively). These 369 CD genes were used as a guide to perform all the comparative bioinformatic analyses. We also screened for genes encoding membrane proteins (located on the surface of cells) as potential cell-specific markers. With this, in addition to identifying the best CDs that mark the main cell types (i.e., monocytes, B lymphocytes, natural killer cells, etc), we could check the ability of new gene signatures to distinguish each cell type. To do so, we applied Random Forest (i.e., a machine learning method) to test the accuracy of different gene signatures in identifying different haematological and immune cell types.

10:30-10:45
Automatic identification of GO-Terms related to ripening fruit of tomato based on machine learning and its application in a breeding program
Room: UNAM
Format: Live from venue

  • Paolo Caccharelli, IICAR, Argentina
  • Flavio Spetale, Cifasis-Conicet, Argentina
  • Guillermo Pratta, IICAR-CONICET, Argentina
  • Elizabeth Tapia, CIFASIS-CONICET, Argentina


Background: Tomato quality is one of the most important factors that helps ensure a consistent marketing of tomato fruit. The genome sequencing of tomatoes has provided powerful insights into the molecular changes in fruit ripening, a complex developmental process that is highly coordinated and includes changes in color, texture, taste and flavor. The automatic identification of biological functionalities of genes during the ripening process is important because it detects a list of GO-Terms and genes related to maturation. Given that gene expression revealed by massive RNA-seq is a metric measurement, estimation of gene action by Quantitative Genetics analysis provides a direct application to identify promising parents and hybrid taking the additive and non-additive genetic effects on RNA levels as a selection criteria in breeding programs.
Results: An analysis of the global expression of genes was developed through the technique of massive sequencing of RNA-seq in parent genotypes and their hybrid. The goal was to identify GO-Terms of biological processes related to fruit ripening in which genes with differential expression are involved and analyze their gene action. The study of genes was obtained with transcriptional profiles in three mainly ripening stages (Breaker, Mature Green and Red Ripe) and then a new analysis was carried out with the AgriGO tool in order to characterize and identify genes involved in these development stages. In addition, this enrichment analysis allows exploring the biological processes related to fruit ripening in which differential expression transcripts participate. Comparison into the biological processes domain between these three genotypes generated a list of consensus GO-Terms, of which 4 were detected (GO:0044699, GO:0008152, GO:0044710 and GO:0005975). On the other hand, 20 genes were randomly selected from the total of 2,744 genes to estimate the additive and non-additive genetic effects. This result showed 17 genes with negative overdominance from the parent with the lowest value, and the others showed partial dominance towards lower values.
Conclusions: This first approach was proposed to identify GO-Terms related to a development stage in the fruit of tomato and detect potential genes of interest to continue with the breeding program and to obtain new varieties of tomatoes.

10:45-11:00
Genome-wide bisulfite sequencing analysis of cultivated and wild rice species reveals epigenome variation in response to aluminum stress
Room: UNAM
Format: Live from venue

  • Jenny Johana Gallo-Franco, Pontificia Universidad Javeriana-Cali, Colombia
  • Thaura Ghneim-Herrera, Universidad Icesi, Colombia
  • Fabian Tobar-Tosse, Pontificia Universidad Javeriana-Cali, Colombia
  • Mauricio Quimbaya, Pontificia Universidad Javeriana-Cali, Colombia


DNA methylation has been defined as the most studied epigenetic modification involved in several biological processes, such as plant genome stability, developmental regulation, and environmental responses. However, little is known about the potential role of DNA methylation in response to aluminum (Al) stress in rice. To determine the dynamics of DNA methylation variation associated with Al exposure in rice plants, we analyzed single-base resolution methylome maps for two genotypes of Oryza sativa, a cultivated species, with contrasting response to Al-stress conditions (Azucena-Tolerant and BGI9311-Susceptible). We also analyzed the methylome of two genotypes of O. glumaepatula, a wild species, with contrasting response to Al-exposure (Og131-Tolerant and Og131-Susceptible). Our results showed that, under control conditions, genome-wide methylation profiles are mainly conserved between both species. Nevertheless, there are several differentially methylated regions (DMRs) with species-specific methylation patterns. In addition, we identified a large number of DMRs for tolerant and susceptible genotypes for both species between control and stress conditions. Several of these DMRs are related to genes previously reported as Al-responsive genes, suggesting a possible role of rice DNA methylation in regulating the Al stress response. Likewise, we analyzed the association of identified DNA methylation marks with Al-tolerance levels of the genotypes studied within each rice species, as well as variation in the methylome of different rice species in response to Al-exposure. Our findings provide novel insights into genome-wide DNA methylation profiles of wild and cultivated rice genotypes and their possible role in regulating plant responses to stress.

11:00-11:15
Logical model of the tolerization of Dendritic cells integrates novel players
Room: UNAM
Format: Live from venue

  • Karen Nuñez-Reza, International Laboratory for Human Genome Research, Mexico
  • Isaac Lozano-Jiménez, International Laboratory for Human Genome Research, Mexico
  • Leslie Martínez-Hernández, International Laboratory for Human Genome Research, Mexico
  • Alejandra Medina-Rivera, Universidad Nacional Autónoma de México, Mexico


Tolerogenic dendritic cells (tolDC) play an essential role in regulating immune response by inducing an effective Treg response, especially those obtained with IL10-based protocol. Recently tolDC have become a relevant subject due to their ability to regulate immune response under different conditions like autoimmune diseases and food allergies.
In order to characterize tolDC that exhibit a robust Treg induction, we built a logical model of their tolerization using IL10, by integrating published knowledge, transcriptome data, and identification of transcription factor binding sites (TFBS).
We first performed a literature search in PubMed and found several papers that were included in our model. We then performed a gene expression analysis of the available transcriptome data in GEO (GSE117946). Based on our gene expression analysis, we identified five differentially expressed transcription factors (TFs), IRF8, TCF7L2, GAL4, CEBPB, and TFCP2L1, in tolDC had not been related before to tolDC obtention. Using JASPAR (Castro-Mondragon et al. 2022) matrices for these TFs we searched for TFBS in the upstream region of genes related to the tolerogenic phenotype in tolDC obtained using IL10 protocol. Once we predicted TFBS in the interested genes, those new predicted regulations were used to complete our model.
Our logical model integrates differential gene expresión, predicted transcription factor binding sites, and current knowledge about IL10 signaling in monocytes-derived dendritic cells (moDC). With our completed model we performed in silico mutants of the TFs involved (STAT3, STAT6, IRF8, TCF7L2, GAL4, CEBPB, and TFCP2L1), consistent with current knowledge of the STAT6, CEBPB, and IRF8 mutants, in the presence of IL10, allow for the expression of tolerogenic specific gene markers. On the contrary STAT3, TCF7L2, and TFCP2L1 mutants, in the presence of IL10, abolished the tolerogenic specific gene markers.
The novelty of our study is the identification of the role of TFs that had not been described as involved in the tolerization of dendritic cells. Our model helps understand the differences in gene expression when IL10 is used to obtain tolDC and could be used as a base to integrate other tolerogenic protocols and be able to contrast the basal behavior with immune diseases.

11:15-11:30
Using Graph Convolutional Networks (GCNs) for Molecular Property Prediction from Natural Products
Room: UNAM
Format: Live from venue

  • Naicolette Agudo, Universidad Tecnologica de Panama, Panama
  • José Luis López, University of Salamanca, Spain
  • Grimaldo Ureña, Universidad Tecnologica de Panama, Panama
  • Javier E. Sanchez-Galan, Universidad Tecnologica de Panama, Panama


Machine learning has been applied at length for molecular characterization in specific for predicting properties of molecules for the pharmaceutical industry. It has been also used for the prediction of biochemical and physiological effects of natural products. For this task, neural networks have been extensively used. Lately, Graph Convolutional Networks (GCN) have been used for this task. Exploiting the fact that graphs can readily represent the structure of molecules and keeps its integrity in the analysis. Also, they will be used for the prediction of properties behaves in a non-linear fashion, needing to take into consideration a substantial number of parameters (feature space) to predict a single characteristic of the output. For natural products, these predictions can be even more fruitful since they could help elucidate the use of complex compounds.


This project will take as basis the NAPROC-13 NMR-based database of natural products in SMILE format. Its access will be provided from our collaborators from the University of Salamanca. The main objective of this project will be to apply GCNs to molecules in NAPROC-13 to predict chemical properties. The properties to be predicted will be made within this database as validation. A second objective is to focus and study molecules that have been found via bioprospection in the Panamanian territory.

GCNs will have as inputs molecular graph. Different network architectures will be tested. Specially, architectures which are known to be suitable for this task and taking those which we consider most applicable in many approaches: Graph Attention Networks (GAT), Mixture Model Network (Monet) and Chebyshev Networks (ChebNet). The molecular information supplied will be used as targets. The optimization of the GCN architecture to be used is contemplated, using Grid search strategies.

This analysis will provide a basis on which further work on structural analysis can be done. The algorithms used and optimized will have the potential to be applied to other molecules in the database with relatively unknown characteristics, in addition to other databases. With this, the opportunity to improve the interfaces for the structure of new compounds opens. It will be possible to characterize the components and properties of any natural products such as coffee, a known export from the Republic of Panama.

12:00-12:15
Prediction of bacterial interactions using metabolic network features
Room: UNAM
Format: Live-stream

  • Claudia Silva-Andrade, Universidad Mayor, Chile
  • Daniel Garrido, Laboratorio de Microbiología de Sistemas, Escuela de Ingeniería, Pontificia Universidad Católica de Chile, Chile
  • Maria Rodriguez-Fernandez, Institute for Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile., Chile
  • Alberto J. Martin, Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida,Universidad San Sebastián, Chile


Understanding the interactions between microorganisms and how these relationships affect bacterial behavior at the community level is a key research topic in microbiology. Microbial consortia engineering has been established as a scientific discipline and its main objective is the creation of consortia with a particular behavior, either by increasing the productivity of specific metabolites or by modifying the metabolic functionality to obtain stable communities over time.
There are different methods to study interactions between bacteria based on experimental or mathematical approaches. Mathematical approaches use mathematical models to represent various properties of bacteria and can be classified as static or dynamic depending on whether or not they contain information on how community members interact over time.
Metabolic networks describe the interactions between metabolic pathways, mapping the enzymes (represented as edges in the network) and metabolites (represented as nodes in the network), which can be used to understand different metabolic processes in a microorganism in order to optimize the production of a particular metabolite. Metabolic networks are also being used to create stoichiometric models of genome-scale metabolism in consortia to understand interactions between pairs of bacteria, highlighting the relevance of this approach to characterizing bacteria.
In this work, we describe a new method that aims to reduce the number of experimental trials needed to design bacterial consortia with a particular behavior. For that, the representation of bacteria in terms of their metabolic networks was used to build a mathematical model able to predict cross-feeding interactions or competition between pairs of bacteria. We first used the simplest supervised classifier, K-Nearest Neighbors, to choose among several ways of encoding the metabolisms of two bacteria, test different parameter values, and implement various data curation approaches to reduce the biological bias associated with our dataset. Next, we tested different classification algorithms and performed rigorous cross-validation experiments to select the best one for our dataset. The top performing supervised machine learning algorithm obtained an overall rate of correctly classified pairs of bacteria between 92% and 96%. Our method will surely prove useful to improve our understanding of community behavior and, at the same time, aid in rational consortia design approaches by reducing the number of experiments required to identify beneficial interactions among bacteria.

12:15-12:30
PPIntegrator: Semantic integrative system for protein-protein interaction and application for Host-Pathogen datasets
Room: UNAM
Format: Live-stream

  • Yasmmin Côrtes Martins, National Laboratory for Scientific Computing, Brazil
  • Artur Ziviani, National Laboratory for Scientific Computing, Brazil
  • Maiana de Oliveira Cerqueira E Costa, National Laboratory for Scientific Computing, Brazil
  • Maria Claudia Cavalcanti, Military Institute of Engineering, Brazil
  • Marisa Fabiana Nicolás, National Laboratory fo Scientific Computation, Brazil
  • Ana Tereza Vasconcelos, National Laboratory fo Scientific Computation, Brazil


Semantic web standards have shown their importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology initiative that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is the proteinprotein interactions (PPIs) which have many applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential Host-Pathogen datasets by transitivity analysis. PPIntegrator contains two modules: i) a data preparation module to organize the data of three reference databases and ii) a triplification and data fusion module to describe the provenance information, protein annotations when they exist and results of scores separated by detection method. This work provides an overview of the PPIntegrator system applied to integrate and compare Host-Pathogen PPI datasets from four bacterial species using our proposed transitivity analysis
pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system.

12:30-12:45
AN INTEGRATIVE APPROACH IDENTIFIES HOST-MICROBE INTERACTIONS IN THE BLOOD OF SEPTIC PATIENTS
Room: UNAM
Format: Live-stream

  • Ícaro Maia Santos de Castro, University of Sao Paulo, Brazil
  • Marielton Passos Cunha, Scientific Platform Pasteur USP, Brazil
  • Youvika Singh, University of São Paulo, Brazil
  • Paulo Amaral, Insper Institute of Education and Research, Brazil
  • Helder Nakaya, Hospital Israelita Albert Einstein, Brazil


In recent years, the role of the human microbiome in health and disease has been widely elucidated. These microbial communities not only contribute to the host's local defense against infections but also modulate immune system responses in different locations and tissues. Nevertheless, under conditions that compromise the host's defense barrier, some species can invade tissues and cause infections. Nowadays, RNA-Seq has become the leading high-throughput sequencing technology for understanding the immune response in infectious diseases. Most human transcriptomic studies evaluate gene expression exclusively of the human host. However, during the read alignment process, most pipelines end up discarding reads not mapped to the human genome. These unmapped reads can provide valuable information of non-host RNA transcripts derived from microorganisms that can be present in these samples. Here, we propose a bioinformatic approach to identify potential microbial-derived transcripts from these unmapped reads and their impact on host gene expression during infectious diseases. Applying our computational approach to 3 different transcriptomic studies of sepsis we were able to identify several microbes in the blood of septic patients. Furthermore, we identified common nosocomial pathogens already described in sepsis studies, suggesting possible bacteremia events. Also, we identified transcriptional pathways and immune modules altered by the presence of those pathogens. Considering the increasing amount of publicly available transcriptomic data, our approach may enlighten researchers on the potential of analyzing otherwise discarded biological information to generate new insights to microbe-host interactions that can impact host immune response in infectious diseases.

12:45-13:00
Modeling and structural characterization of FAZ10 protein regions: understanding their function in Trypanosoma brucei
Room: UNAM
Format: Live from venue

  • Cleidy Mirela Osorio Mogollón, University of Sao Paulo, Brazil
  • Diego Leonardo Cabrejos, University of Sao Paulo, Brazil
  • Munira Muhammad Abdel Baqui, University of Sao Paulo, Brazil


Trypanosoma brucei is the etiological agent of Sleeping Sickness, a tropical and neglected disease endemic to sub-Saharan Africa. Trypanosomatids have a complex zone called Flagellum Attachment Zone (FAZ), connecting its single flagellum to the cell body. High molecular mass proteins localized at the FAZ region have helped understand the maintenance of cellular morphology, cytokinesis, and survival. One of these proteins is FAZ10, recently described by our group in T. brucei. We showed that FAZ10 is required for determining cleavage furrow positioning, FAZ organization, and correct cytokinesis. However, little is known about the molecular structure of this protein. Here, we report the in silico structural characterization of FAZ10 regions. To predict 3D models, we used AlphaFold2, a intelligence artificial program for structural biology, predicting 3D models of protein structures; MARCOIL and LOGICOIL to identify and analyze coiled-coil motifs in FAZ10; IUPRED2A, a server to predict disordered regions, and Pymol to visualize models. The FAZ10 protein has several coiled-coil motifs along its amino acid sequence, including the region of the N-terminal and C-terminal domains followed by the two disordered domains. The central domain is folded, which may be involved in protein interaction.
Furthermore, FAZ10 has the potential to be a protein dimer. The knowledge of the three-dimensional arrangement of this protein allows us to understand the FAZ10 function and its interaction with other proteins present in the FAZ region. These studies will provide a better understanding of the complexity of FAZ in T. brucei and more excellent knowledge of this parasite of public health importance to a continent.

13:00-13:15
Genomic profiling of bacteria for the prediction of synthetic community assembly under antagonistic interactions
Room: UNAM
Format: Live from venue

  • Marisol Navarro-Miranda, Cinvestav, Irapuato, Mexico
  • Maribel Hernandez-Rosales, Cinvestav, Irapuato, Mexico
  • Gabriela Olmedo-Alvarez, Cinvestav, Irapuato, Mexico


Microbial communities play critical roles in a wide range of natural processes, from biogeochemical cycles to microbiomes. A fundamental problem in ecology is community assembly that seeks to understand how deterministic and stochastic processes give rise to observed patterns in species abundances over space and time. Therefore, the use of well-studied environmental strains in the construction of synthetic communities under controlled conditions provides robust models to monitor their assembly, pattern formation, and testing of particular hypotheses. We study 78 strains of the phylum Bacillota for which we have obtained phenotypic data on their pairwise interactions. The strains are part of a collection of natural sediment communities isolated from a lagoon in Cuatro Cienegas, Coahuila, Mexico. We sequenced the genome of the isolates and performed a pangenomic analysis, identified gene clusters encoding secondary metabolites, and inferred a core genome-based phylogeny. A subset of three strains from this community has been previously studied and established as the so-called BARS synthetic community model. In paired interactions, each strain exhibits a different ecological role, such as antagonism, resistance, and sensitivity. As a community, they exhibit high-order properties of complex natural communities, with an emergent property where antagonism is not observed in the presence of the resistant strain. With the genomic characterization of the 78 strains, we aim to explain features that allow the three BARS strains to cohabit and to predict other candidate strains that can be substituted in the BARS interaction and even increase the number of strains that can assemble in a stable community. These will be tested in further assembly dynamics experiments. If we assume that synthetic communities can capture some properties of natural ones, this could help us to understand environmental problems where microbes are the main actors.

13:15-14:15
Hunting for species and genera in metagenome datasets
Room: UNAM
Format: Live from venue

  • João Setubal

Our Courses



Taller/Workshop: Argentina Noviembre 2023

Martes, 31 de Octubre de 9:00 a 18:00hs
"Bioinformática para el análisis de biodatos utilizando aprendizaje computacional y redes."
Taller Práctico Avanzado Organizado por RIABIO-&-SOIBIO. Dia Completo 8 horas (Dividido en 4 Módulos / 4 Sesiones)
"Bioinformatics for biodata analysis using Machine Learning and network approaches"
Advanced Practical Workshop Organized by RIABIO-&-SOIBIO. FULL DAY 8 horas (Divided in 4 Modules / 4 Sessions)


  • 1. Global gene expression analysis of RNA-seq data using a friendly pipeline and a robust differential expression algorithm based on bootstrapping.
    Javier De Las Rivas and Alberto Berral (CSIC-USAL-IBSAL, Spain)
  • 2. Analysis of transcriptomic data for gene-to-gene interaction inference, for gene network construction, and robust graph analysis.
    Maribel Hernandez Rosales and Marisol Navarro (CINVESTAV, México)
  • 3. Single-cell RNA-seq data analysis: a pipeline from raw transcriptomic data to cell-type gene signature identification and celular trajectories analysis.
    Javier De Las Rivas (CSIC-USAL-IBSAL, Spain)
  • 4. Application of a GO annotation algorithm and other gene functional enrichment algorithms based on machine learning methods.
    Flavio E. Spetale (CIFASIS-CONICET-UNR, Argentina)

  • Email Address
    RiaBioNet@gmail.com
  • Street Address
    Bv. 27 de Febrero 210 bis - S2000EZP Rosario, Argentina
  • Website URL
    www.riabio.net