Edward M Marcotte
Full Professor of Biochemistry at the University of Texas at Austin (United States)
The paper is technically rigorous and of good overall quality.
The work agrees with an emerging picture of the single-celled ancestor of all modern complex organisms (dubbed LECA, the last eukaryotic common ancestor) as a swimming microbe with thousands of genes and much of the complexity of modern cells already present by around 1.5 to 1.8 billion years ago. The authors here show that LECA's genes arose from multiple earlier ancestors, a combination of genes from different branches of the microbial tree of life and even from viruses. This work and that from other groups is giving us a picture of the earliest steps in the evolution of modern complex organisms.
[Regarding possible limitations] As with any attempt to look so back into deep time, even if the big picture is correct, I expect the details to continued to be refined, changing on the margins with future analyses. In particular, especially as the worldwide community continues to discover ever more branches of microbial life, we can expect better and better estimates of the ancestry of modern genes.
I work in the same general area, and as it happens, my group is also publishing a paper this past week reporting the proteome & interactome of LECA, the last eukarotic common ancester that the paper you sent also studies. Our paper published 1 week ago in Cell Genomics. It describes determining the protein-coding genes (the proteome) present in LECA and then addresses the question of how these proteins were organized into "molecular machines", capturing the physical organization of the basic biochemical machinery in this critical ancester of all modern complex life. We then apply this information to learn about current day organisms, including discovering new genes affecting e.g. bone density or birth defects, based on these ancestral proteins and interactions.
In contrast, the paper you sent from Toni Gabaldón's group also first defines the genes in LECA but then turns to the question of where these genes came from, i.e. asking about their ancestry--did they arise from bacterial ancestors or archaeal ancestors, and can these origins be more precisely determined? Thus, our 2 papers are quite complementary, with both papers sharing the same first goal (defining the protein-coding genes in LECA) then using those to ask different questions. I'm unable to directly compare our results without having access to all of the data (& some time to study them), but at least the general methods for determining the genes in LECA look fairly comparable (they cite our bioRxiv preprint as a relevant method for their paper), and we both determine similar overall numbers of gene families that our groups date back to LECA, with the Gabaldón group estimating about 7,751 - 12,907 LECA gene families ("orthogroups") and our paper estimating 6,429 - 10,091 LECA orthogroups, both giving ranges that depend on the stringency of the analysis. So, at a high level the first parts of our papers appear to be very concordant.
So, while both groups describe LECA's genes, our work uses that as a launching point to study modern genes and diseases, while the work of Bernabeu, Manzano-Morales, Marcet-Houben, and Gabaldón use that to look back in time even more deeply to study where LECA's genes came from.