Autor/es reacciones

Alfonso Valencia

ICREA professor and director of Life Sciences at the Barcelona National Supercomputing Centre (BSC).

This work represents a new trend in the development of predictive tools. Traditionally, existing data, such as information on proteins known to form aggregates, is collected and then a computational method is designed to analyse it. Here, the process is reversed: first, a robust, fast and inexpensive experimental system is created to generate large-scale artificial data - data that are broader and more varied than those available in nature, and therefore potentially better for training a system with improved predictive capabilities.

In this publication, researchers at CRG and IBEC have designed a large-scale assay that measures protein aggregation by measuring the growth rate of cells expressing random DNA fragments of defined length. A neural network trained on this data accurately classifies fragments that promote aggregation, outperforming previous methods based on real protein data. The apparent paradox is that a large amount of artificial data can be more useful than a small amount of ‘high-quality’ data. As a precedent, Oded Regev's group, working in a specific area of genomics, designed a system capable of generating and evaluating hundreds of thousands of artificial sequences to train their new predictor.

Overall, this publication advances the understanding of protein aggregation, a topic with important biomedical (neurodegenerative diseases) and biotechnological (industrial protein production) implications. Using technology to interpret the results of neural networks, the study suggests a new version of the sequence patterns that favour aggregation, which may contribute to understanding how mutations and external factors influence the aggregation process and indicate how to control it.

EN