Navigating Copyright and AI Training: A Complex Legal Terrain
The intersection of copyright law and the training of generative AI models presents a paradoxical situation, as per a recent Open Access study commissioned by the Initiative Urheberrecht. This study argues that the training of AI models does not fall under Text and Data Mining (TDM), a crucial finding as AI companies defend their practices by referencing this concept to avoid compensating creatives.
Understanding the Study and Its Implications
Tim W. Dornis, a legal scholar from the University of Hannover, and Sebastian Stober, an AI professor at Otto-von-Guericke-University Magdeburg, led the study. They approached the research with an open mind, acknowledging that a thorough interdisciplinary analysis had been previously lacking.
According to Dornis and Stober, the processes involved in AI training often include duplication of copyrighted materials within the AI models. This duplication is a significant point of concern, particularly when considering potential legal actions against AI companies for copyright infringements tied to AI training and deployment.
The TDM Debate and AI Regulations
Text and Data Mining is often brought up in discussions about whether current AI training practices violate copyright law. TDM involves extracting new insights from large datasets, usually aimed at strategic business and policy applications. However, the researchers assert that the application of TDM regulations to generative AI training is misplaced.
Stober emphasizes that although TDM involves collecting data for training, generative AI models function differently. These models replicate data similar to their training sets, without generating new insights. Thus, according to the study, generative AI training is not covered by TDM exceptions.
Challenges and the Future of AI Training
One potential solution discussed is the implementation of a "reservation against TDM" for published materials. However, Dornis highlights practical challenges, such as the difficulty of applying such reservations to already published works and widespread online content.
The AI industry has operated under the assumption that training practices are akin to "Fair Use," a concept not recognized in Europe. This assumption reflects the "move fast and break things" ethos common in Silicon Valley, wherein the focus has traditionally been on rapid innovation rather than strict compliance with existing laws.
Overall, this research underscores the need for clear legal frameworks and regulations that address the nuances of AI training, protecting both innovation and the rights of copyright holders.
This article was originally reported by Heise Online.