This article is 7 months old

Spanish project develops an AI to predict protein aggregation

A team led by the Centre for Genomic Regulation (CRG) in Barcelona and the Institute for Bioengineering of Catalonia (IBEC) has developed and used a new artificial intelligence (AI) tool called CANYA, together with a large volume of data, to predict when and why protein aggregation takes place. The resource could be used to advance research into neurodegenerative diseases and drug production, according to the joint press release. The results are published in the journal Science Advances.

30/04/2025 - 20:00 CEST
Expert reactions

Alfonso Valencia - IA CANYA EN

Alfonso Valencia

ICREA professor and director of Life Sciences at the Barcelona National Supercomputing Centre (BSC).

Science Media Centre Spain

This work represents a new trend in the development of predictive tools. Traditionally, existing data, such as information on proteins known to form aggregates, is collected and then a computational method is designed to analyse it. Here, the process is reversed: first, a robust, fast and inexpensive experimental system is created to generate large-scale artificial data - data that are broader and more varied than those available in nature, and therefore potentially better for training a system with improved predictive capabilities.

In this publication, researchers at CRG and IBEC have designed a large-scale assay that measures protein aggregation by measuring the growth rate of cells expressing random DNA fragments of defined length. A neural network trained on this data accurately classifies fragments that promote aggregation, outperforming previous methods based on real protein data. The apparent paradox is that a large amount of artificial data can be more useful than a small amount of ‘high-quality’ data. As a precedent, Oded Regev's group, working in a specific area of genomics, designed a system capable of generating and evaluating hundreds of thousands of artificial sequences to train their new predictor.

Overall, this publication advances the understanding of protein aggregation, a topic with important biomedical (neurodegenerative diseases) and biotechnological (industrial protein production) implications. Using technology to interpret the results of neural networks, the study suggests a new version of the sequence patterns that favour aggregation, which may contribute to understanding how mutations and external factors influence the aggregation process and indicate how to control it.

The author has not responded to our request to declare conflicts of interest
EN

Gonzalo Jiménez-Oses - CANYAS

Gonzalo Jiménez-Oses

Ikerbasque research professor at the Computational Chemistry Laboratory of CIC bioGUNE

Science Media Centre Spain

The article describes an experiment that indirectly evaluates—using a cellular genomic assay based on fusion constructs—the aggregation capacity of random amino acid sequences corresponding to 20-residue peptides. The vast majority of these sequences were found to be non-aggregating. However, the large number of sequences analyzed allowed the training of a simple neural network capable of classifying these peptides as aggregating or non-aggregating.

The model confirms previous knowledge about some of the main determinants of aggregation, such as hydrophobic and β-sheet-rich motifs. Although its predictive power for larger native proteins with globular structures remains limited, and there is a marked positional dependence on the sequence, the work represents an advance in the investigation of the intrinsic aggregation propensity of short peptides, with applications in the pharmaceutical field.

It also illustrates the importance of generating large, diverse, and standardized high-quality experimental datasets for the development of AI models applied to protein biophysics and science in general.

The author has not responded to our request to declare conflicts of interest
EN
Publications
Journal
Science Advances
Publication date
Authors

Thompson et al.

Study types:
  • Research article
  • Peer reviewed
The 5Ws +1
Publish it
FAQ
Contact