A Test that AI Never Fails: The EU’s AI Data Privacy Laws as a Pedagogical Model for Expanding the US “Fair Use” Test
The development of artificial intelligence (AI) tools has raised questions about data privacy. More specifically, text and data mining (TDM), the process by which computers extract information from written sources, has been the subject of several recent AI court cases [1]. The disputes are rooted in the large-scale web scraping and local storage of billions of copyrighted works that TDM entails [2]. In the United States, a general “fair use” test for copyrighted material is applied in TDM cases to determine whether the acquisition of AI training data has infringed on copyright protections [3]. “Fair use” refers to legal authorization to use copyrighted material without consent of the rightsholders, or owners of the copyrighted materials [4]. By contrast, as part of its Digital Decade initiative, the European Union has implemented stricter data collection laws to protect the rightsholders [5]. These laws clearly delineate the procedures that AI companies are prohibited from undertaking, such as the collection of superfluous data, thus reinforcing the applicability of data privacy laws to AI [6]. This article advocates for the US to adopt laws akin to those in the EU. It is important that the US leverages the EU as a pedagogical model because the US’ current system for adjudicating TDM cases fails to establish data guardrails tailored to AI development. Safeguarding intellectual property in the US through a federal TDM law will provide AI companies with clarity and allow for a more uniform enforcement of data protection. As such, the US should follow the EU’s lead in bolstering data privacy laws to solidify their relevance within the AI and TDM sector.
Compared to US privacy laws, those in the EU are more protective of rightsholders and consumers who interact with digital platforms. The EU’s data privacy litigation landscape includes the General Data Protection Regulation (GDPR), the Digital Single Market (DSM) Directive, and the AI Act [7]. The GDPR became operative in 2018 and requires that data collectors solicit user consent before storing personal information, only collect necessary data, and do so in a transparent manner [8]. Additionally, data subjects can retrieve a copy of their personal data that is collected [9]. On the other hand, the US relies on the “fair use” test laid forth in 17 U.S.C. § 107 when deciding TDM copyright cases [10]. This statute permits the usage of copyrighted materials without explicit consent based on: the purpose of the use, the nature of the copyrighted material, the substantiality of the portion used, and the potential effects on the material’s value or market [11]. The divergent priorities of the regions are apparent through a comparison of these two laws: while the GDPR focuses on granting consumers and rightsholders greater protections, the “fair use” test provides grounds under which rightsholders’ content can be used without expressed consent.
The US has adopted some measures similar to the EU, but it still has a long way to go. An analysis of the EU’s digital privacy landscape reveals its small, yet noteworthy, influence on the US. A month after the GDPR became enforced, California passed the first comprehensive data privacy law in the US [12]. The California Consumer Privacy Act (CCPA), similar to the GDPR, grants consumers the right to know what information is being collected, request its deletion, and opt out of data collection [13]. Now, twenty states have comprehensive data protection laws [14]. The US should continue in this direction, using more recent EU laws as a pedagogical guide to ensure that data security acts evolve with new technologies.
But mirroring the GDPR alone will not be sufficient to establish a TDM-specific legislation landscape. Instead, post-GDPR EU laws provide a framework for how data privacy legislation in the US should account for the plethora of data scraped via TDM. The EU’s 2019 DSM Directive permits rightsholders to opt out of having their subject matter used by data collectors, although an exception in Article 3 protects technological progress by granting organizations complete data access for scientific research [15]. Hence, the DSM establishes a balance between the pursuit of innovation and preservation of data privacy that can be emulated in the US by extrapolating the intuition of the “fair use” test to AI training data. Furthermore, the AI Act, passed in 2024, tailors restrictions on AI-related practices based on a tiered risk classification system [16]. By outlining prohibitions for specific AI practices, this act provides transparency for AI developers and consumers. Both the DSM Directive and AI Act offer a reference for navigating the tradeoff between innovation and data privacy within the scope of AI. Applying a similar set of AI-specific laws in the US would lay the foundation for stronger copyright tests pertaining to AI training and TDM. However, such laws should contain exceptions, such as the DSM’s Article 3, to protect high-priority research.
In the US, in the absence of specific TDM laws, judges have vacillated between ruling in favor of and against rightsholders. For instance, in Thomson Reuters Enterprise Centre GmbH and West Publishing Corp. v. ROSS Intelligence Inc., Judge Stephanos Bibas initially ruled against Thomson Reuters regarding ROSS’ use of their Westlaw content to build a competing tool [17]. However, he reversed his decision pursuant to 17 U.S.C. § 107 [18]. The fluctuation in Judge Bibas’ rulings reveals the need for AI-specific laws. In his 2025 reversal, Judge Bibas established that ROSS’ use of the Westlaw copyrighted material was not protected by the “fair use” test because it was not transformative, ROSS’ tool could not generate novel content, and was intended to compete in the same market as Westlaw [19]. However, later the same year, the rulings of two cases with large AI developers expanded the practices deemed as fair use. In Kadrey et al. v. Meta Platforms, Judge Vince Chhabria determined that Meta’s use of books scraped from “shadow libraries” to train its LLM Llama constituted fair use because it was transformative [20]. He so ruled because Generative AI can create new content [21]. But establishing generative AI model training as a transformative use of copyrighted material through court precedent grants AI developers free rein over the use of intellectual property, accentuating the outdated status of the “fair use” test. The EU’s AI-specific data privacy laws can inform the development of new laws in the US.
A common concern with tighter restrictions on access to training data is the hindrance of technological progression. The argument follows that a model’s performance typically improves as it is supplied with larger quantities of training data. Therefore, limiting access to materials would hurt the development of more sophisticated AI tools [22]. However, the US can follow the EU’s lead in striking a balance between innovation and preservation of copyrighted material. For example, in Robert Kneschke v. LAION, a photographer, Robert Kneschke, sued LAION for reproducing one of his images for an AI training dataset without consent [23]. Although Kneschke’s photograph was subject to copyright, the Hamburg District Court ruled that LAION’s use of the image for TDM was protected under Article 3 of the DSM [24]. Unlike the US, however, the EU has firmly established where it draws lines for legitimate TDM practices. For instance, Clearview AI has been fined over 75 million euros by watchdogs in Italy, Greece, and France for collecting roughly 30 billion photos for facial recognition [25]. In Germany and Austria, Clearview AI’s practices have been found illegal [26]. Furthermore, Italy’s privacy watchdog Garante fined OpenAI 15 million euros for failing to establish a legal basis for collecting users’ personal data and lacking transparency about the data usage [27]. This consistency in TDM and data privacy court cases establishes a sense of predictability for AI developers in the EU, an element missing in the US.
Enacting a US data privacy law related to AI will ensure that the principles set forth in 17 U.S.C. § 107 by the Copyright Act of 1976 are upheld in an age where innovation threatens to push aside legal doctrines. The “fair use” test is effective as a generalized catch-all for copyright infringement cases. However, it fails to establish a concrete defense against copyright infringement caused by TDM for training AI models. Although granting developers freedom in supplying a wide range of materials as training data can result in better model performance, it also endangers the principles of the “fair use” test. The decisions in Thomson Reuters and Kadrey elucidate that the “fair use” test does not shed enough clarity on TDM. Instead, the US should look to the EU’s tailored AI litigation as a guideline to protect rightsholders and outline predictable rules for AI developers.
Bibliography
[1] Joshua Love, “Text and data mining in the US,” Reed Smith, effective February 5, 2024, https://www.reedsmith.com/en/perspectives/ai-in-entertainment-and-media/2024/02/text-and-data-mining-in-us.
[2] U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre-publication version) (May 2025).
[3] Love, “Text and Data Mining in the US.”
[4] 17 U.S.C. § 107.
[5] European Commission, Europe’s Digital Decade: Digital Targets for 2030, accessed October 11, 2025, https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/europes-digital-decade-digital-targets-2030_en.
[6] Robert Hart, “Clearview AI — Controversial Facial Recognition Firm — Fined $33 Million for ‘Illegal Database,’” Forbes, September 3, 2024, accessed October 11, 2025, https://www.forbes.com/sites/roberthart/2024/09/03/clearview-ai-controversial-facial-recognition-firm-fined-33-million-for-illegal-database/.
[7] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), O.J. L 119, 4 May 2016, 1–88.
[8] Ibid.
[9] Ibid.
[10] Love, “Text and Data Mining in the US.”
[11] 17 U.S.C. § 107.
[12] Daisuke Wakabayashi, “California Passes Sweeping Law to Protect Online Privacy,” New York Times, June 28, 2018, accessed October 11, 2025, https://www.nytimes.com/2018/06/28/technology/california-online-privacy-law.html.
[13] Ibid.
[14] “Which States Have Consumer Data Privacy Laws?,” Bloomberg Law, April 7, 2025, accessed October 11, 2025, https://pro.bloomberglaw.com/insights/privacy/state-privacy-legislation-tracker/#map-of-state-privacy-laws.
[15] Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC, O.J. L 130, 17 May 2019, 92–125.
[16] Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No. 300/2008, (EU) No. 167/2013, (EU) No. 168/2013, (EU) 2018/858, (EU) 2018/1139, and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797, and (EU) 2020/1828 (Artificial Intelligence Act), O.J. L 168, 12 July 2024, 1–135.
[17] Thomson Reuters Enterprise Centre GmbH and West Publishing Corp. v. ROSS Intelligence Inc., No. 1:20-cv-613-SB (D. Del. February 11, 2025) (Memorandum Opinion).
[18] 17 U.S.C. § 107.
[19] Thomson Reuters v. ROSS Intelligence Inc.
[20] Kadrey et al. v. Meta Platforms, Inc., No. 3:23-cv-03417, Document 598 (N.D. Cal. June 25, 2025) (Order Denying Plaintiffs’ Motion for Partial Summary Judgment and Granting Meta’s Cross-Motion for Partial Summary Judgment).
[21] Ibid.
[22] U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training.
[23] LAION v. Robert Kneschke, Hamburg District Court, Case No. 310 O 227/23 (Sept. 27, 2024).
[24] Ibid.
[25] Robert Hart, “Clearview AI — Controversial Facial Recognition Firm — Fined $33 Million for ‘Illegal Database,’” Forbes, September 3, 2024, accessed October 11, 2025, https://www.forbes.com/sites/roberthart/2024/09/03/clearview-ai-controversial-facial-recognition-firm-fined-33-million-for-illegal-database/.
[26] Ibid.
[27] Giada Zampano, “Italy’s Privacy Watchdog Fines OpenAI for ChatGPT’s Violations in Collecting Users’ Personal Data,” AP News, December 20, 2024, accessed October 11, 2025, https://apnews.com/article/italy-privacy-authority-openai-chatgpt-fine-6760575ae7a29a1dd22cc666f49e605f.