Gonzalo Jiménez-Oses
Ikerbasque research professor at the Computational Chemistry Laboratory of CIC bioGUNE
The article describes an experiment that indirectly evaluates—using a cellular genomic assay based on fusion constructs—the aggregation capacity of random amino acid sequences corresponding to 20-residue peptides. The vast majority of these sequences were found to be non-aggregating. However, the large number of sequences analyzed allowed the training of a simple neural network capable of classifying these peptides as aggregating or non-aggregating.
The model confirms previous knowledge about some of the main determinants of aggregation, such as hydrophobic and β-sheet-rich motifs. Although its predictive power for larger native proteins with globular structures remains limited, and there is a marked positional dependence on the sequence, the work represents an advance in the investigation of the intrinsic aggregation propensity of short peptides, with applications in the pharmaceutical field.
It also illustrates the importance of generating large, diverse, and standardized high-quality experimental datasets for the development of AI models applied to protein biophysics and science in general.