Alfonso Valencia
ICREA professor and director of Life Sciences at the Barcelona National Supercomputing Centre (BSC).
The agreed legislative proposal mainly affects two types of applications: human recognition applications (facial recognition, biometrics, etc.), which are banned with exceptions, and so-called 'high-risk' applications, such as general models (the popular ChatGPT, among others).
With regard to applications of biometric recognition techniques, exceptions are made for law enforcement purposes, making it difficult to know where the use of these technologies will end up in practice.
Regarding the second category of large general models (foundational models) that, trained with large amounts of text, reproduce increasingly advanced features of practical utility (from ChatGPT to GPT4), the paper proposes restrictive measures that require risk assessment, detailed description of their operation or describing all data sources used for their training. Measures that are relatively easy to apply to traditional systems such as those operating in banks or insurance companies, but very difficult or impossible in new AI systems.
The extension of these measures will mean that the current systems of large companies will not be able to operate in Europe, except through IPs from outside Europe (a booming business). In this context it will be very difficult for Europe, where research groups, SMEs and much smaller companies than the US operate, to develop competitive systems. Aware of the damage that these measures could cause both to the companies that develop systems and to those that use them, the text itself speaks of environments in which companies can develop these "secure" systems, which it leaves in the hands of governments. Given that there is neither the initiative, nor the budget, nor the unity of action, nor the technology to create such systems, it seems that the implementation of the proposed measures will definitively leave Europe out of the development of major AI models.
Not the least of the issues contained in this proposal is the repetition of the need for models to adhere to intellectual property laws. Training foundational models requires the use of massive data equivalent to basically the content of the internet. In this huge amount of data it is impossible to automatically detect the level of copyright, let alone pay for the use of each text. This limitation alone is enough to put an end to the development of these large models in Europe. The alternative exists and is reasonable, if we consider that the models are no more than statistical systems that do not reproduce the specific texts, but only generate aggregate statistical characteristics of all the information. Therefore, it would seem reasonable to look for general measures to compensate for the use of information instead of massively applying copyright laws.
In short, Europe is very limited in the use and development of large models, without offering technically and economically viable alternatives.