Pablo Haya Coll
Researcher at the Computer Linguistics Laboratory of the Autonomous University of Madrid (UAM) and director of Business & Language Analytics (BLA) of the Institute of Knowledge Engineering (IIC)
Speech impairment is an important biomarker of neurodegenerative disorders such as Alzheimer's disease. The research line where the article is located proposes the use of natural language processing (NLP) techniques for the early detection of Alzheimer's disease through speech. The authors use a classifier based on language models, specifically GPT-3, which determines whether a person is developing Alzheimer's disease and to what degree, based on the text extracted from a locution. The classifier has been validated using real speech from healthy people and people with Alzheimer's disease. The results reflect new evidence of the superiority of incorporating language models in problems of a certain complexity where PLN has a place.
The real impact of this technology as a diagnostic test is more debatable. Firstly, it would have been interesting if the article had included a comparison with the methods currently used in the early detection of Alzheimer's disease. Only the comparison with other PLN-based methods is included.
Secondly, the cost-benefit analysis should take into account the false positive rate, which has not been reported. Open use to the public, as proposed by the authors via a website or a mobile app, would lead to many more healthy people passing the test than people with Alzheimer's disease. Depending on the false positive rate, many healthy people could be diagnosed as developing the disease. This would most likely lead to a disproportionate increase in alternative tests to verify whether the results are correct.
Finally, before this technology could be used as a diagnostic test, it would have to comply with the validation protocols established by the various health agencies. The study presented in the article would correspond to a very preliminary phase given the size and representativeness of the sample used.