What are you looking for?

Showing 9 of 49 Results in AI

The Cost of Justice at the Dawn of AI

Scholarship Abstract Justice isn’t free, but it might soon get much less expensive. Policies concerning issues such as arbitration, class actions, and plea bargaining depend on . . .

Abstract

Justice isn’t free, but it might soon get much less expensive. Policies concerning issues such as arbitration, class actions, and plea bargaining depend on how much legal services cost, but the legal literature has generally ignored past and future cost trends and their implications. The result is a legal system that may change dramatically because of economic forces without active consideration of potential responses. Part of the reason for the lack of attention is that changes in legal productivity can be difficult to measure or forecast. Some commentators have concluded that the legal sector has become more expensive in recent decades, but they have missed both evidence that advances their case and arguments against it. The advent of AI introduces the possibility that lawyers’ productivity will improve, reducing legal costs and ameliorating concerns about access to justice. The legal system can best prepare by more explicitly recognizing how procedure and doctrine depend on cost, thus smoothing the path for a possible productivity revolution rather than relying entirely on the political system to respond. For example, courts could explicitly incorporate a cost-benefit framework that already is implicit in much summary judgment case law, potentially enabling more cases to be tried to verdict if legal services become cheaper. Similarly, greater honesty that the criminal justice system ratchets up penalties to encourage plea-bargaining might help avoid an outcome in which cost efficiencies allow prosecutors to exact longer prison sentences than legislatures intended.

Continue reading
Innovation & the New Economy

From Data Myths to Data Reality: What Generative AI Can Tell Us About Competition Policy (and Vice Versa)

Scholarship I. Introduction It was once (and frequently) said that Google’s “data monopoly” was unassailable: “If ‘big data’ is the oil of the information economy, Google . . .

I. Introduction

It was once (and frequently) said that Google’s “data monopoly” was unassailable: “If ‘big data’ is the oil of the information economy, Google has Standard Oil-like monopoly dominance — and uses that control to maintain its dominant position.”[1] Similar epithets have been hurled at virtually all large online platforms, including Facebook (Meta), Amazon, and Uber.[2]

While some of these claims continue even today (for example, “big data” is a key component of the U.S. Justice Department’s (“DOJ”) Google Search and AdTech antitrust suits),[3] a shiny new data target has emerged in the form of generative artificial intelligence. The launch of ChatGPT in November 2022, as well as the advent of AI image-generation services like Midjourney and Dall-E, have dramatically expanded people’s conception of what is, and what might be, possible to achieve with generative AI technologies built on massive data sets.

While these services remain in the early stages of mainstream adoption and are in the throes of rapid, unpredictable technological evolution, they nevertheless already appear on the radar of competition policymakers around the world. Several antitrust enforcers appear to believe that, by acting now, they can avoid the “mistakes” that were purportedly made during the formative years of Web 2.0.[4] These mistakes, critics assert, include failing to appreciate the centrality of data in online markets, as well as letting mergers go unchecked and allowing early movers to entrench their market positions.[5] As Lina Khan, Chair of the FTC, put it: “we are still reeling from the concentration that resulted from Web 2.0, and we don’t want to repeat the mis-steps of the past with AI”.[6]

In that sense, the response from the competition-policy world is deeply troubling. Instead of engaging in critical self-assessment and adopting an appropriately restrained stance, the enforcement community appears to be chomping at the bit. Rather than assessing their prior assumptions based on the current technological moment, enforcers’ top priority appears to be figuring out how to deploy existing competition tools rapidly and almost reflexively to address the presumed competitive failures presented by generative AI.[7]

It is increasingly common for competition enforcers to argue that so-called “data network effects” serve not only to entrench incumbents in the markets where that data is collected, but also confer similar, self-reinforcing benefits in adjacent markets. Several enforcers have, for example, prevented large online platforms from acquiring smaller firms in adjacent markets, citing the risk that they could use their vast access to data to extend their dominance into these new markets.[8] They have also launched consultations to ascertain the role that data plays in AI competition. For instance, in an ongoing consultation, the European Commission asks: “What is the role of data and what are its relevant characteristics for the provision of generative AI systems and/or components, including AI models?”[9] Unsurprisingly, the U.S. Federal Trade Commission (“FTC”) has been bullish about the risks posed by incumbents’ access to data. In comments submitted to the U.S. Copyright Office, for example, the FTC argued that:

The rapid development and deployment of AI also poses potential risks to competition. The rising importance of AI to the economy may further lock in the market dominance of large incumbent technology firms. These powerful, vertically integrated incumbents control many of the inputs necessary for the effective development and deployment of AI tools, including cloud-based or local computing power and access to large stores of training data. These dominant technology companies may have the incentive to use their control over these inputs to unlawfully entrench their market positions in AI and related markets, including digital content markets.[10]

Against this backdrop, it stands to reason that the largest online platforms—including Alphabet, Meta, Apple, and Amazon — should have a meaningful advantage in the burgeoning markets for generative AI services. After all, it is widely recognized that data is an essential input for generative AI.[11] This competitive advantage should be all the more significant given that these firms have been at the forefront of AI technology for more than a decade. Over this period, Google’s DeepMind and AlphaGo and Meta’s have routinely made headlines.[12] Apple and Amazon also have vast experience with AI assistants, and all of these firms use AI technology throughout their platforms.[13]

Contrary to what one might expect, however, the tech giants have, to date, been unable to leverage their vast data troves to outcompete startups like OpenAI and Midjourney. At the time of writing, OpenAI’s ChatGPT appears to be, by far, the most successful chatbot[14], despite the fact that large tech platforms arguably have access to far more (and more up-to-date) data.

This article suggests there are important lessons to be learned from the current technological moment, if only enforcers would stop to reflect. The meteoric rise of consumer-facing AI services should offer competition enforcers and policymakers an opportunity for introspection. As we explain, the rapid emergence of generative AI technology may undercut many core assumptions of today’s competition-policy debates — the rueful after-effects of the purported failure of 20th-century antitrust to address the allegedly manifest harms of 21st-century technology. These include the notions that data advantages constitute barriers to entry and can be leveraged to project dominance into adjacent markets; that scale itself is a market failure to be addressed by enforcers; and that the use of consumer data is inherently harmful to those consumers.

II. Data Network Effects Theory and Enforcement

Proponents of tougher interventions by competition enforcers into digital markets often cite data network effects as a source of competitive advantage and barrier to entry (though terms like “economies of scale and scope” may offer more precision).[15] The crux of the argument is that “the collection and use of data creates a feedback loop of more data, which ultimately insulates incumbent platforms from entrants who, but for their data disadvantage, might offer a better product.”[16] This self-reinforcing cycle purportedly leads to market domination by a single firm. Thus, for Google, for example, it is argued that its “ever-expanding control of user personal data, and that data’s critical value to online advertisers, creates an insurmountable barrier to entry for new competition.”[17]

Right off the bat, it is important to note the conceptual problem of these claims. Because data is used to improve the quality of products and/or to subsidize their use, the idea of data as an entry barrier suggests that any product improvement or price reduction made by an incumbent could be a problematic entry barrier to any new entrant. This is tantamount to an argument that competition itself is a cognizable barrier to entry. Of course, it would be a curious approach to antitrust if this were treated as a problem, as it would imply that firms should under-compete — should forego consumer-welfare enhancements—in order to bring about a greater number of firms in a given market simply for its own sake.[18]

Meanwhile, actual economic studies of data network effects are few and far between, with scant empirical evidence to support the theory.[19] Andrei Hagiu and Julian Wright’s theoretical paper offers perhaps the most comprehensive treatment of the topic.[20] The authors ultimately conclude that data network effects can be of different magnitudes and have varying effects on firms’ incumbency advantage.[21] They cite Grammarly (an AI writing-assistance tool) as a potential example: “As users make corrections to the suggestions offered by Grammarly, its language experts and artificial intelligence can use this feedback to continue to improve its future recommendations for all users.”[22]

This is echoed by other economists who contend that “[t]he algorithmic analysis of user data and information might increase incumbency advantages, creating lock-in effects among users and making them more reluctant to join an entrant platform.”[23]

Crucially, some scholars take this logic a step further, arguing that platforms may use data from their “origin markets” in order to enter and dominate adjacent ones:

First, as we already mentioned, data collected in the origin market can be used, once the enveloper has entered the target market, to provide products more efficiently in the target market. Second, data collected in the origin market can be used to reduce the asymmetric information to which an entrant is typically subject when deciding to invest (for example, in R&D) to enter a new market. For instance, a search engine could be able to predict new trends from consumer searches and therefore face less uncertainty in product design.[24]

This possibility is also implicit in the paper by Hagiu and Wright.[25] Indeed, the authors’ theoretical model rests on an important distinction between within-user data advantages (that is, having access to more data about a given user) and across-user data advantages (information gleaned from having access to a wider user base). In both cases, there is an implicit assumption that platforms may use data from one service to gain an advantage in another market (because what matters is information about aggregate or individual user preferences, regardless of its origin).

Our review of the economic evidence suggests that several scholars have, with varying degrees of certainty, raised the possibility that incumbents may leverage data advantages to stifle competitors in their primary market or adjacent ones (be it via merger or organic growth). As we explain below, however, there is ultimately little evidence to support such claims.

Policymakers, however, have largely been receptive to these limited theoretical findings, basing multiple decisions on these theories, often with little consideration of the caveats that accompany them.[26] Indeed, it is remarkable that, in the Furman Report’s section on “[t]he data advantage for incumbents,” only two empirical economic studies are cited, and they offer directly contradictory conclusions with respect to the question of the strength of data advantages.[27] Nevertheless, the Furman Report concludes that data “may confer a form of unmatchable advantage on the incumbent business, making successful rivalry less likely,”[28] and adopts without reservation “convincing” evidence from non-economists with apparently no empirical basis.[29]

In the Google/Fitbit merger proceedings, the European Commission found that the combination of data from Google services with that of Fitbit devices would reduce competition in advertising markets:

Giving [sic] the large amount of data already used for advertising purposes that Google holds, the increase in Google’s data collection capabilities, which goes beyond the mere number of active users for which Fitbit has been collecting data so far, the Transaction is likely to have a negative impact on the development of an unfettered competition in the markets for online advertising.[30]

As a result, the Commission cleared the merger on the condition that Google refrain from using data from Fitbit devices for its advertising platform.[31] The Commission will likely focus on similar issues during its ongoing investigation into Microsoft’s investment into OpenAI.[32]

Along similar lines, the FTC’s complaint to enjoin Meta’s purchase of a virtual-reality (VR) fitness app called “Within” relied, among other things, on the fact that Meta could leverage its data about VR-user behavior to inform its decisions and potentially outcompete rival VR-fitness apps: “Meta’s control over the Quest platform also gives it unique access to VR user data, which it uses to inform strategic decisions.”[33]

The U.S. Department of Justice’s twin cases against Google also raise data leveraging and data barriers to entry. The agency’s AdTech complaint that “Google intentionally exploited its massive trove of user data to further entrench its monopoly across the digital advertising industry.”[34] Similarly, in its Search complaint, the agency argues that:

Google’s anticompetitive practices are especially pernicious because they deny rivals scale to compete effectively. General search services, search advertising, and general search text advertising require complex algorithms that are constantly learning which organic results and ads best respond to user queries; the volume, variety, and velocity of data accelerates the automated learning of search and search advertising algorithms.[35]

Finally, the merger guidelines published by several competition enforcers cite the acquisition of data as a potential source of competitive concerns. For instance, the FTC and DOJ’s newly published guidelines state that “acquiring data that helps facilitate matching, sorting, or prediction services may enable the platform to weaken rival platforms by denying them that data.”[36] Likewise, the UK Competition and Markets Authority (“CMA”) warns against incumbents acquiring firms in order to obtain their data and foreclose other rivals:

Incentive to foreclose rivals…

7.19(e) Particularly in complex and dynamic markets, firms may not focus on short term margins but may pursue other objectives to maximise their long-run profitability, which the CMA may consider. This may include… obtaining access to customer data….[37]

In short, competition authorities around the globe are taking an aggressive stance on data network effects. Among the ways this has manifested is in basing enforcement decisions on fears that data collected by one platform might confer a decisive competitive advantage in adjacent markets. Unfortunately, these concerns rest on little to no empirical evidence, either in the economic literature or the underlying case records.

III. Data Incumbency Advantages in Generative AI Markets

Given the assertions canvassed in the previous section, it seems reasonable to assume that firms such as Google, Meta, and Amazon would be in pole position to dominate the burgeoning market for generative AI. After all, these firms have not only been at the forefront of the field for the better part of a decade, but they also have access to vast troves of data, the likes of which their rivals could only dream when they launched their own services. Thus the authors of the Furman Report caution that “to the degree that the next technological revolution centres around artificial intelligence and machine learning, then the companies most able to take advantage of it may well be the existing large companies because of the importance of data for the successful use of these tools.[38]

At the time of writing, however, this is not how things have unfolded — although it bears noting these markets remain in flux and the competitive landscape is susceptible to change. The first significantly successful generative AI service was arguably not from either Meta—which had been working on chatbots for years and had access to, arguably, the world’s largest database of actual chats—or Google. Instead, the breakthrough came from a previously unknown firm called OpenAI.

OpenAI’s ChatGPT service currently holds an estimated 60% of the market (though reliable numbers are somewhat elusive).[39] It broke the record for the fastest online service to reach 100 million users (in only a couple of months), more than four times faster than the previous record holder, TikTok.[40] Based on Google Trends data, ChatGPT is nine times more popular than Google’s own Bard service worldwide, and 14 times more popular in the U.S.[41] In April 2023, ChatGPT reportedly registered 206.7 million unique visitors, compared to 19.5 million for Google’s Bard.[42] In short, at the time of writing, ChatGPT appears to be the most popular chatbot. And, so far, the entry of large players such as Google Bard or Meta AI appear to have had little effect on its market position.[43]

The picture is similar in the field of AI image generation. As of August 2023, Midjourney, Dall-E, and Stable Diffusion appear to be the three market leaders in terms of user visits.[44] This is despite competition from the likes of Google and Meta, who arguably have access to unparalleled image and video databases by virtue of their primary platform activities.[45]

This raises several crucial questions: how have these AI upstarts managed to be so successful, and is their success just a flash in the pan before Web 2.0 giants catch up and overthrow them? While we cannot answer either of these questions dispositively, some observations concerning the role and value of data in digital markets would appear to be relevant.

A first important observation is that empirical studies suggest data exhibits diminishing marginal returns. In other words, past a certain point, acquiring more data does not confer a meaningful edge to the acquiring firm. As Catherine Tucker puts it, following a review of the literature: “Empirically there is little evidence of economies of scale and scope in digital data in the instances where one would expect to find them.”[46]

Likewise, following a survey of the empirical literature on this topic, Geoffrey Manne & Dirk Auer conclude that:

Available evidence suggests that claims of “extreme” returns to scale in the tech sector are greatly overblown. Not only are the largest expenditures of digital platforms unlikely to become proportionally less important as output increases, but empirical research strongly suggests that even data does not give rise to increasing returns to scale, despite routinely being cited as the source of this effect.[47]

In other words, being the firm with the most data appears to be far less important than having enough data, and this lower bar may be accessible to far more firms than one might initially think possible.

And obtaining enough data could become even easier — that is, the volume of required data could become even smaller — with technological progress. For instance, synthetic data may provide an adequate substitute to real-world data[48] — or may even outperform real-world data.[49] As Thibault Schrepel and Alex Pentland point out, “advances in computer science and analytics are making the amount of data less relevant every day. In recent months, important technological advances have allowed companies with small data sets to compete with larger ones.”[50]

Indeed, past a certain threshold, acquiring more data might not meaningfully improve a service, where other improvements (such as better training methods or data curation) could have a large effect. In fact, there is some evidence that excessive data impedes a service’s ability to generate results appropriate for a given query: “[S]uperior model performance can often be achieved with smaller, high-quality datasets than massive, uncurated ones. Data curation ensures that training datasets are devoid of noise, irrelevant instances, and duplications, thus maximizing the efficiency of every training iteration.”[51]

Consider, for instance, a user who wants to generate an image of a basketball. Using a model trained on an indiscriminate range and number of public photos in which a basketball appears, but is surrounded by copious other image data, the user may end up with an inordinately noisy result. By contrast, a model trained with a better method on fewer, more-carefully selected images, could readily yield far superior results.[52] In one important example,

[t]he model’s performance is particularly remarkable, given its small size. “This is not a large language model trained on the whole Internet; this is a relatively small transformer trained for these tasks,” says Armando Solar-Lezama, a computer scientist at the Massachusetts Institute of Technology, who was not involved in the new study…. The finding implies that instead of just shoving ever more training data into machine-learning models, a complementary strategy might be to offer AI algorithms the equivalent of a focused linguistics or algebra class.[53]

Current efforts are thus focused on improving the mathematical and logical reasoning of large language models (“LLMs”), rather than maximizing training datasets.[54] Two points stand out. The first is that firms like OpenAI rely largely on publicly available datasets — such as GSM8K — to train their LLMs.[55] Second, the real challenge to create cutting-edge AI is not so much in collecting data, but rather in creating innovative AI training processes and architectures:

[B]uilding a truly general reasoning engine will require a more fundamental architectural innovation. What’s needed is a way for language models to learn new abstractions that go beyond their training data and have these evolving abstractions influence the model’s choices as it explores the space of possible solutions.

We know this is possible because the human brain does it. But it might be a while before OpenAI, DeepMind, or anyone else figures out how to do it in silicon.[56]

Furthermore, it is worth noting that the data most relevant to startups operating in a given market may not be those data held by large incumbent platforms in other markets, but rather data specific to the market in which the startup is active or, even better, to the given problem it is attempting to solve:

As Andres Lerner has argued, if you wanted to start a travel business, the data from Kayak or Priceline would be far more relevant. Or if you wanted to start a ride-sharing business, data from cab companies would be more useful than the broad, market-cross-cutting profiles Google and Facebook have. Consider companies like Uber, Lyft and Sidecar that had no customer data when they began to challenge established cab companies that did possess such data. If data were really so significant, they could never have competed successfully. But Uber, Lyft and Sidecar have been able to effectively compete because they built products that users wanted to use — they came up with an idea for a better mousetrap. The data they have accrued came after they innovated, entered the market and mounted their successful challenges — not before.[57]

The bottom line is that data is not the be-all and end-all that many in competition circles rather casually make it out to be.[58] While data may often confer marginal benefits, there is little sense these are ultimately decisive.[59] As a result, incumbent platforms’ access to vast numbers of users and data in their primary markets might only marginally affect their AI competitiveness.

A related observation is that firms’ capabilities and other features of their products arguably play a more important role than the data they own.[60] Examples of this abound in digital markets. Google overthrew Yahoo, despite initially having access to far fewer users and far less data; Google and Apple overcame Microsoft in the smartphone OS market despite having comparatively tiny ecosystems (at the time) to leverage; and TikTok rose to prominence despite intense competition from incumbents like Instagram, which had much larger user bases. In each of these cases, important product-design decisions (such as the PageRank algorithm, recognizing the specific needs of mobile users,[61] and TikTok’s clever algorithm) appear to have played a far greater role than initial user and data endowments (or lack thereof).

All of this suggests that the early success of OpenAI likely has more to do with its engineering decisions than the data it did (or did not) own. And going forward, OpenAI and its rivals’ ability to offer and monetize compelling stores offering custom versions of their generative AI technology will arguably play a much larger role than (and contribute to) their ownership of data.[62] In other words, the ultimate challenge is arguably to create a valuable platform, of which data ownership is a consequence, but not a cause.

It is also important to note that, in those instances where it is valuable, data does not just fall from the sky. Instead, it is through smart business and engineering decisions that firms can generate valuable information (which does not necessarily correlate with owing more data).

For instance, OpenAI’s success with ChatGPT is often attributed to its more efficient algorithms and training models, which arguably have enabled the service to improve more rapidly than its rivals.[63] Likewise, the ability of firms like Meta and Google to generate valuable data for advertising arguably depends more on design decisions that elicit the right data from users, rather than the raw number of users in their networks.

Put differently, setting up a business so as to generate the right information is more important than simply owning vast troves of data.[64] Even in those instances where high-quality data is an essential parameter of competition, it does not follow that having vaster databases or more users on a platform necessarily leads to better information for the platform.

Given what precedes, it seems clear that OpenAI and other generative AI startups’ early success, as well as their chances of prevailing in the future, hinge on a far broader range of factors than the mere ownership of data. Indeed, if data ownership consistently conferred a significant competitive advantage, these new firms would not be where they are today. This does not mean that data is worthless, of course. Rather, it means that competition authorities should not assume that merely possessing data is a dispositive competitive advantage, absent compelling empirical evidence to support such a finding. In this light, the current wave of decisions and competition-policy pronouncements that rely on data-related theories of harm are premature.

IV. Five Key Takeaways: Reconceptualizing the Role of Data in Generative AI Competition

As we explain above, data (network effects) are not the source of barriers to entry that they are sometimes made out to be; rather, the picture is far more nuanced. Indeed, as economist Andres Lerner demonstrated almost a decade ago (and the assessment is only truer today):

Although the collection of user data is generally valuable for online providers, the conclusion that such benefits of user data lead to significant returns to scale and to the entrenchment of dominant online platforms is based on unsupported assumptions. Although, in theory, control of an “essential” input can lead to the exclusion of rivals, a careful analysis of real-world evidence indicates that such concerns are unwarranted for many online businesses that have been the focus of the “big data” debate.[65]

While data can be an important part of the competitive landscape, incumbent data advantages are far less pronounced than today’s policymakers commonly assume. In that respect, five main lessons emerge:

  1. Data can be (very) valuable, but past a certain threshold, the benefits tend to diminish. In other words, having the most data is less important than having enough;
  2. The ability to generate valuable information does not depend on the number of users or the amount of data a platform has previously acquired;
  3. The most important datasets are not always proprietary;
  4. Technological advances and platforms’ engineering decisions affect their ability to generate valuable information, and this effect swamps the effect of the amount of data they own; and
  5. How platforms use data is arguably more important than what data or how much data they own.

These lessons have important ramifications for competition-policy debates over the competitive implications of data in technologically evolving areas.

First, it is not surprising that startups, rather than incumbents, have taken an early lead in generative AI (and in Web 2.0 before it). After all, if data-incumbency advantages are small or even nonexistent, then smaller and more nimble players may have an edge over established tech platforms. This is all the more likely given that, despite significant efforts, the biggest tech platforms were unable to offer compelling generative AI chatbots and image-generation services before the emergence of ChatGPT, Dall-E, Midjourney, etc. This failure suggests that, in a process akin to Christensen’s Innovator’s Dilemma,[66] something about their existing services and capabilities was holding them back in those markets. Of course, this does not necessarily mean that those same services/capabilities could not become an advantage when the generative AI market starts addressing issues of monetization and scale.[67] But it does mean that assumptions of a firm’s market power based on its possession of data are off the mark.

Another important implication is that, paradoxically, policymakers’ efforts to prevent Web 2.0 platforms from competing freely in generative AI markets may ultimately backfire and lead to less, not more, competition. Indeed, OpenAI is currently acquiring a sizeable lead in generative AI. While competition authorities might like to think that other startups will emerge and thrive in this space, it is important not to confuse desires with reality. For, while there is a vibrant AI-startup ecosystem, there is at least a case to be made that the most significant competition for today’s AI leaders will come from incumbent Web 2.0 platforms — although nothing is certain at this stage. Policymakers should beware not to stifle that competition on the misguided assumption that competitive pressure from large incumbents is somehow less valuable to consumers than that which originates from smaller firms.

Finally, even if there were a competition-related market failure to be addressed (which is anything but clear) in the field of generative AI, it is unclear that contemplated remedies would do more good than harm. Some of the solutions that have been put forward have highly ambiguous effects on consumer welfare. Scholars have shown that mandated data sharing — a solution championed by EU policymakers, among others — may sometimes dampen competition in generative AI markets.[68] This is also true of legislation like the GDPR that make it harder for firms to acquire more data about consumers — assuming such data is, indeed, useful to generative AI services.[69]

In sum, it is a flawed understanding of the economics and practical consequences of large agglomerations of data that lead competition authorities to believe that data-incumbency advantages are likely to harm competition in generative AI markets — or even in the data-intensive Web 2.0 markets that preceded them. Indeed, competition or regulatory intervention to “correct” data barriers and data network and scale effects is liable to do more harm than good.

[1] Nathan Newman, Taking on Google’s Monopoly Means Regulating Its Control of User Data, Huffington Post (Sep. 24, 2013), http://www.huffingtonpost.com/nathan-newman/taking-on-googlesmonopol_b_3980799.html.

[2] See e.g. Lina Khan & K. Sabeel Rahman, Restoring Competition in the U.S. Economy, in Untamed: How to Check Corporate, Financial, and Monopoly Power (Nell Abernathy, Mike Konczal, & Kathryn Milani, eds., 2016), at 23 (“From Amazon to Google to Uber, there is a new form of economic power on display, distinct from conventional monopolies and oligopolies…, leverag[ing] data, algorithms, and internet-based technologies… in ways that could operate invisibly and anticompetitively.”); Mark Weinstein, I Changed My Mind — Facebook Is a Monopoly, Wall St. J. (Oct. 1, 2021), https://www.wsj.com/articles/facebook-is-monopoly-metaverse-users-advertising-platforms-competition-mewe-big-tech-11633104247 (“[T]he glue that holds it all together is Facebook’s monopoly over data…. Facebook’s data troves give it unrivaled knowledge about people, governments — and its competitors.”).

[3] See generally Abigail Slater, Why “Big Data” Is a Big Deal, The Reg. Rev. (Nov. 6, 2023), https://www.theregreview.org/2023/11/06/slater-why-big-data-is-a-big-deal/; Amended Complaint at ¶36, United States v. Google, 1:20-cv-03010- (D.D.C. 2020); Complaint at ¶37, United States v. Google, 1:23-cv-00108 (E.D. Va. 2023), https://www.justice.gov/opa/pr/justice-department-sues-google-monopolizing-digital-advertising-technologies (“Google intentionally exploited its massive trove of user data to further entrench its monopoly across the digital advertising industry.”).

[4] See e.g. Press Release, European Commission, Commission Launches Calls for Contributions on Competition in Virtual Worlds and Generative AI (Jan. 9, 2024), https://ec.europa.eu/commission/presscorner/detail/en/IP_24_85; Krysten Crawford, FTC’s Lina Khan warns Big Tech over AI, SIEPR (Nov. 3, 2020), https://siepr.stanford.edu/news/ftcs-lina-khan-warns-big-tech-over-ai (“Federal Trade Commission Chair Lina Khan delivered a sharp warning to the technology industry in a speech at Stanford on Thursday: Antitrust enforcers are watching what you do in the race to profit from artificial intelligence.”) (emphasis added).

[5] See e.g. John M. Newman, Antitrust in Digital Markets, 72 Vand. L. Rev. 1497, 1501 (2019) (“[T]he status quo has frequently failed in this vital area, and it continues to do so with alarming regularity. The laissez-faire approach advocated for by scholars and adopted by courts and enforcers has allowed potentially massive harms to go unchecked.”);
Bertin Martins, Are New EU Data Market Regulations Coherent and Efficient?, Bruegel Working Paper 21/23 (2023), available at https://www.bruegel.org/working-paper/are-new-eu-data-market-regulations-coherent-and-efficient (“Technical restrictions on access to and re-use of data may result in failures in data markets and data-driven services markets.”); Valéria Faure-Muntian, Competitive Dysfunction: Why Competition Law Is Failing in a Digital World, The Forum Network (Feb. 24, 2021), https://www.oecd-forum.org/posts/competitive-dysfunction-why-competition-law-is-failing-in-a-digital-world.

[6] Rana Foroohar, The Great US-Europe Antitrust Divide, FT (Feb. 5, 2024), https://www.ft.com/content/065a2f93-dc1e-410c-ba9d-73c930cedc14.

[7] See e.g. Press Release, European Commission, supra note 5.

[8] See infra, Section II. Commentators have also made similar claims. See, e.g., Ganesh Sitaram & Tejas N. Narechania, It’s Time for the Government to Regulate AI. Here’s How, Politico (Jan. 15, 2024) (“All that cloud computing power is used to train foundation models by having them “learn” from incomprehensibly huge quantities of data. Unsurprisingly, the entities that own these massive computing resources are also the companies that dominate model development. Google has Bard, Meta has LLaMa. Amazon recently invested $4 billion into one of OpenAI’s leading competitors, Anthropic. And Microsoft has a 49 percent ownership stake in OpenAI — giving it extraordinary influence, as the recent board struggles over Sam Altman’s role as CEO showed.”).

[9] Press Release, European Commission, supra note 5.

[10] Comment of U.S. Federal Trade Commission to the U.S. Copyright Office, Artificial Intelligence and Copyright, Docket No. 2023-6 (Oct. 30, 2023) at 4, available at https://www.ftc.gov/legal-library/browse/advocacy-filings/comment-federal-trade-commission-artificial-intelligence-copyright (emphasis added).

[11] See, e.g. Joe Caserta, Holger Harreis, Kayvaun Rowshankish, Nikhil Srinidhi, and Asin Tavakoli, The data dividend: Fueling generative AI, McKinsey Digital (Sept. 15, 2023), https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-data-dividend-fueling-generative-ai (“Your data and its underlying foundations are the determining factors to what’s possible with generative AI.”).

[12] See e.g. Tim Keary, Google DeepMind’s Achievements and Breakthroughs in AI Research, Techopedia (Aug. 11, 2023), https://www.techopedia.com/google-deepminds-achievements-and-breakthroughs-in-ai-research; See e.g. Will Douglas Heaven, Google DeepMind used a large language model to solve an unsolved math problem, MIT Technology Review (Dec. 14, 2023), https://www.technologyreview.com/2023/12/14/1085318/google-deepmind-large-language-model-solve-unsolvable-math-problem-cap-set/; See also A Decade of Advancing the State-of-the-Art in AI Through Open Research, Meta (Nov. 30, 2023), https://about.fb.com/news/2023/11/decade-of-advancing-ai-through-open-research/; See also 200 languages within a single AI model: A breakthrough in high-quality machine translation, Meta, https://ai.meta.com/blog/nllb-200-high-quality-machine-translation/ (last visited Jan. 18, 2023).

[13] See e.g. Jennifer Allen, 10 years of Siri: the history of Apple’s voice assistant, Tech Radar (Oct. 4, 2021), https://www.techradar.com/news/siri-10-year-anniversary; see also Evan Selleck, How Apple is already using machine learning and AI in iOS, Apple Insider (Nov. 20, 2023), https://appleinsider.com/articles/23/09/02/how-apple-is-already-using-machine-learning-and-ai-in-ios; see also Kathleen Walch, The Twenty Year History Of AI At Amazon, Forbes (July 19, 2019), https://www.forbes.com/sites/cognitiveworld/2019/07/19/the-twenty-year-history-of-ai-at-amazon/?sh=1734bcb268d0.

[14] See infra Section III.

[15] See e.g. Cédric Argenton & Jens Prüfer, Search Engine Competition with Network Externalities, 8 J. Comp. L. & Econ. 73, 74 (2012); Mark A. Lemley & Matthew Wansley, Coopting Disruption (February 1, 2024), https://ssrn.com/abstract=4713845.

[16] John M. Yun, The Role of Big Data in Antitrust, in The Global Antitrust Institute Report on the Digital Economy (Joshua D. Wright & Douglas H. Ginsburg, eds., Nov. 11, 2020) at 233, available at https://gaidigitalreport.com/2020/08/25/big-data-and-barriers-to-entry/#_ftnref50. See also e.g. Robert Wayne Gregory, Ola Henfridsson, Evgeny Kaganer, & Harris Kyriakou, The Role of Artificial Intelligence and Data Network Effects for Creating User Value, 46 Acad. of Mgmt. Rev. 534 (2020), final pre-print version at 4, available at http://wrap.warwick.ac.uk/134220) (“A platform exhibits data network effects if, the more that the platform learns from the data it collects on users, the more valuable the platform becomes to each user.”). See also Karl Schmedders, José Parra-Moyano & Michael Wade, Why Data Aggregation Laws Could be the Answer to Big Tech Dominance, Silicon Republic (Feb. 6, 2024), https://www.siliconrepublic.com/enterprise/data-ai-aggregation-laws-regulation-big-tech-dominance-competition-antitrust-imd.

[17] Nathan Newman, Search, Antitrust, and the Economics of the Control of User Data, 31 Yale J. Reg. 401, 409 (2014) (emphasis added). See also id. at 420 & 423 (“While there are a number of network effects that come into play with Google, [“its intimate knowledge of its users contained in its vast databases of user personal data”] is likely the most important one in terms of entrenching the company’s monopoly in search advertising…. Google’s overwhelming control of user data… might make its dominance nearly unchallengeable.”).

[18] See also Yun, supra note 17 at 229 (“[I]nvestments in big data can create competitive distance between a firm and its rivals, including potential entrants, but this distance is the result of a competitive desire to improve one’s product.”).

[19] For a review of the literature on increasing returns to scale in data (this topic is broader than data network effects) see Geoffrey Manne & Dirk Auer, Antitrust Dystopia and Antitrust Nostalgia: Alarmist Theories of Harm in Digital Markets and Their Origins, 28 Geo Mason L. Rev. 1281, 1344 (2021).

[20] Andrei Hagiu & Julian Wright, Data-Enabled Learning, Network Effects, and Competitive Advantage, 54 RAND J. Econ. 638 (2023) (final preprint available at https://andreihagiu.com/wp-content/uploads/2022/08/Data-enabled-learning-Final-RAND-Article.pdf).

[21] Id. at 2. The authors conclude that “Data-enabled learning would seem to give incumbent firms a competitive advantage. But how strong is this advantage and how does it differ from that obtained from more traditional mechanisms….”

[22] Id.

[23] Bruno Jullien & Wilfried Sand-Zantman, The Economics of Platforms: A Theory Guide for Competition Policy, 54 Info. Econ. & Pol’y 10080, 101031 (2021).

[24] Daniele Condorelli & Jorge Padilla, Harnessing Platform Envelopment in the Digital World, 16 J. Comp. L. & Pol’y 143, 167 (2020).

[25] See Hagiu & Wright, supra note 21.

[26] For a summary of these limitations, see generally Catherine Tucker, Network Effects and Market Power: What Have We Learned in the Last Decade?, Antitrust (Spring 2018) at 72, available at https://sites.bu.edu/tpri/files/2018/07/tucker-network-effects-antitrust2018.pdf. See also Manne & Auer, supra note 20, at 1330.

[27] See Jason Furman, Diane Coyle, Amelia Fletcher, Derek McAuley & Philip Marsden (Dig. Competition Expert Panel), Unlocking Digital Competition (2019) at 32-35 (“Furman Report”), available at https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/785547/unlocking_digital_competition_furman_review_web.pdf.

[28] Id. at 34.

[29] Id. at 35. To its credit, it should be noted, the Furman Report does counsel caution before mandating access to data as a remedy to promote competition. See id. at 75. That said, the Furman Report does maintain that such a remedy should certainly be on the table because “the evidence suggests that large data holdings are at the heart of the potential for some platform markets to be dominated by single players and for that dominance to be entrenched in a way that lessens the potential for competition for the market.” Id. In fact, the evidence does not show this.

[30] Case COMP/M.9660 — Google/Fitbit, Commission Decision (Dec. 17, 2020) (Summary at O.J. (C 194) 7), available at https://ec.europa.eu/competition/mergers/cases1/202120/m9660_3314_3.pdf at 455.

[31] Id. at 896.

[32] See Natasha Lomas, EU Checking if Microsoft’s OpenAI Investment Falls Under Merger Rules, TechCrunch (Jan. 9, 2024), https://techcrunch.com/2024/01/09/openai-microsoft-eu-merger-rules/.

[33] Amended Complaint at 11, Meta/Zuckerberg/Within, Fed. Trade Comm’n. (2022) (No. 605837), available at https://www.ftc.gov/system/files/ftc_gov/pdf/D09411%20-%20AMENDED%20COMPLAINT%20FILED%20BY%20COUNSEL%20SUPPORTING%20THE%20COMPLAINT%20-%20PUBLIC%20%281%29_0.pdf.

[34] Amended Complaint (D.D.C), supra note 4, at ¶37.

[35] Amended Complaint (E.D. Va), supra note 4, at ¶8.

[36] US Dep’t of Justice & Fed. Trade Comm’n, Merger Guidelines (2023) at 25, https://www.ftc.gov/system/files/ftc_gov/pdf/2023_merger_guidelines_final_12.18.2023.pdf.

[37] Competition and Mkts. Auth., Merger Assessment Guidelines (2021) at  ¶7.19(e), https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1051823/MAGs_for_publication_2021_–_.pdf.

[38] Furman Report, supra note 28, at ¶4.

[39] See e.g. Chris Westfall, New Research Shows ChatGPT Reigns Supreme in AI Tool Sector, Forbes (Nov. 16, 2023), https://www.forbes.com/sites/chriswestfall/2023/11/16/new-research-shows-chatgpt-reigns-supreme-in-ai-tool-sector/?sh=7de5de250e9c.

[40] See Krystal Hu, ChatGPT Sets Record for Fastest-Growing User Base, Reuters (Feb. 2, 2023), https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/; Google: The AI Race Is On, App Economy Insights (Feb. 7, 2023), https://www.appeconomyinsights.com/p/google-the-ai-race-is-on.

[41] See Google Trends, https://trends.google.com/trends/explore?date=today%205-y&q=%2Fg%2F11khcfz0y2,%2Fg%2F11ts49p01g&hl=en (last visited, Jan. 12, 2024) and https://trends.google.com/trends/explore?date=today%205-y&geo=US&q=%2Fg%2F11khcfz0y2,%2Fg%2F11ts49p01g&hl=en (last visited Jan. 12, 2024).

[42] See David F. Carr, As ChatGPT Growth Flattened in May, Google Bard Rose 187%, Similarweb Blog (June 5, 2023), https://www.similarweb.com/blog/insights/ai-news/chatgpt-bard/.

[43] See Press Release, Meta, Introducing New AI Experiences Across Our Family of Apps and Devices (Sept. 27, 2023), https://about.fb.com/news/2023/09/introducing-ai-powered-assistants-characters-and-creative-tools/; Sundar Pichai, An Important Next Step on Our AI Journey, Google Keyword Blog (Feb. 6, 2023), https://blog.google/technology/ai/bard-google-ai-search-updates/.

[44] See Ion Prodan, 14 Million Users: Midjourney’s Statistical Success, Yon (Aug. 19, 2023), https://yon.fun/midjourney-statistics/. See also Andrew Wilson, Midjourney Statistics: Users, Polls, & Growth [Oct 2023], ApproachableAI (Oct. 13, 2023), https://approachableai.com/midjourney-statistics/.

[45] See Hema Budaraju, New Ways to Get Inspired with Generative AI in Search, Google Keyword Blog (Oct. 12, 2023), https://blog.google/products/search/google-search-generative-ai-october-update/; Imagine with Meta AI, Meta (last visited Jan. 12, 2024), https://imagine.meta.com/.

[46] Catherine Tucker, Digital Data, Platforms and the Usual [Antitrust] Suspects: Network Effects, Switching Costs, Essential Facility, 54 Rev. Indus. Org. 683, 686 (2019).

[47] Manne & Auer, supra note 20, at 1345.

[48] See e.g. Stefanie Koperniak, Artificial Data Give the Same Results as Real Data—Without Compromising Privacy, MIT News (Mar. 3, 2017), https://news.mit.edu/2017/artificial-data-give-same-results-as-real-data-0303 (“[Authors] describe a machine learning system that automatically creates synthetic data—with the goal of enabling data science efforts that, due to a lack of access to real data, may have otherwise not left the ground. While the use of authentic data can cause significant privacy concerns, this synthetic data is completely different from that produced by real users—but can still be used to develop and test data science algorithms and models.”).

[49] See e.g. Rachel Gordon, Synthetic Imagery Sets New Bar in AI Training Efficiency, MIT News (Nov. 20, 2023), https://news.mit.edu/2023/synthetic-imagery-sets-new-bar-ai-training-efficiency-1120 (“By using synthetic images to train machine learning models, a team of scientists recently surpassed results obtained from traditional ‘real-image’ training methods.).

[50] Thibault Schrepel & Alex ‘Sandy’ Pentland, Competition Between AI Foundation Models: Dynamics and Policy Recommendations, MIT Connection Science Working Paper (Jun. 2023), at 8.

[51] Igor Susmelj, Optimizing Generative AI: The Role of Data Curation, Lightly (last visited Jan 15, 2024), https://www.lightly.ai/post/optimizing-generative-ai-the-role-of-data-curation.

[52] See e.g. Xiaoliang Dai, et al., Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack , ArXiv (Sep. 27, 2023) at 1, https://ar5iv.labs.arxiv.org/html/2309.15807 (“[S]upervised fine-tuning with a set of surprisingly small but extremely visually appealing images can significantly improve the generation quality.”). See also Hu Xu, et al., Demystifying CLIP Data, ArXiv (Sep. 28, 2023), https://arxiv.org/abs/2309.16671.

[53] Lauren Leffer, New Training Method Helps AI Generalize like People Do, Sci. Am. (Oct. 26, 2023), https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/ (discussing Brendan M. Lake & Marco Baroni, Human-Like Systematic Generalization Through a Meta-Learning Neural Network, 623 Nature 115 (2023)).

[54] Timothy B. Lee, The Real Research Behind the Wild Rumors about OpenAI’s Q* Project, Ars Technica (Dec. 8, 2023), https://arstechnica.com/ai/2023/12/the-real-research-behind-the-wild-rumors-about-openais-q-project/.

[55] Id. See also GSM8K, Papers with Code (last visited Jan. 18, 2023), available at https://paperswithcode.com/dataset/gsm8k; MATH Dataset, GitHub (last visited Jan. 18, 2024), available at https://github.com/hendrycks/math.

[56] Lee, supra note 55.

[57] Geoffrey Manne & Ben Sperry, Debunking the Myth of a Data Barrier to Entry for Online Services, Truth on the Market (Mar. 26, 2015), https://truthonthemarket.com/2015/03/26/debunking-the-myth-of-a-data-barrier-to-entry-for-online-services/ (citing Andres V. Lerner, The Role of ‘Big Data’ in Online Platform Competition (Aug. 26, 2014), available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2482780.).

[58] See e.g., Lemley & Wansley, supra note 18, at 22 (“Incumbents have all that information. It would be difficult for a new entrant to acquire similar datasets independently….”).

[59] See Catherine Tucker, Digital Data as an Essential Facility: Control, CPI Antitrust Chron. (Feb. 2020) at 11 (“[U]ltimately the value of data is not the raw manifestation of the data itself, but the ability of a firm to use this data as an input to insight.”).

[60] Or, as John Yun puts it, data is only a small component of digital firms’ production function. See Yun, supra note 17, at 235 (“Second, while no one would seriously dispute that having more data is better than having less, the idea of a data-driven network effect is focused too narrowly on a single factor improving quality. As mentioned in supra Section I.A, there are a variety of factors that enter a firm’s production function to improve quality.”).

[61] Luxia Le, The Real Reason Windows Phone Failed Spectacularly, History–Computer (Aug. 8, 2023), https://history-computer.com/the-real-reason-windows-phone-failed-spectacularly/.

[62] Introducing the GPT Store, Open AI (Jan. 10, 2024), https://openai.com/blog/introducing-the-gpt-store.

[63] See Michael Schade, How ChatGPT and Our Language Models are Developed, OpenAI, https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed; Sreejani Bhattacharyya, Interesting innovations from OpenAI in 2021, AIM (Jan. 1, 2022), https://analyticsindiamag.com/interesting-innovations-from-openai-in-2021/; Danny Hernadez & Tom B. Brown, Measuring the Algorithmic Efficiency of Neural Networks, ArXiv (May 8, 2020), available at https://arxiv.org/abs/2005.04305.

[64] See Yun, supra note 17 at 235 (“Even if data is primarily responsible for a platform’s quality improvements, these improvements do not simply materialize with the presence of more data—which differentiates the idea of data-driven network effects from direct network effects. A firm needs to intentionally transform raw, collected data into something that provides analytical insights. This transformation involves costs including those associated with data storage, organization, and analytics, which moves the idea of collecting more data away from a strict network effect to more of a ‘data opportunity.’”).

[65] Lerner, supra note 58, at 4-5 (emphasis added).

[66] See Clayton M. Christensen, The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail (2013).

[67] See David J. Teece, Dynamic Capabilities and Strategic Management: Organizing for Innovation and Growth (2009).

[68] See Hagiu and Wright, supra note 21, at 4 (“We use our dynamic framework to explore how data sharing works: we find that it in-creases consumer surplus when one firm is sufficiently far ahead of the other by making the laggard more competitive, but it decreases consumer surplus when the firms are sufficiently evenly matched by making firms compete less aggressively, which in our model means subsidizing consumers less.”). See also Lerner, supra note 58.

[69] See e.g. Hagiu & Wright, id. (“We also use our model to highlight an unintended consequence of privacy policies. If such policies reduce the rate at which firms can extract useful data from consumers, they will tend to increase the incumbent’s competitive advantage, reflecting that the entrant has more scope for new learning and so is affected more by such a policy.”); Jian Jia, Ginger Zhe Jin & Liad Wagman, The Short-Run Effects of the General Data Protection Regulation on Technology Venture Investment, 40 Marketing Sci. 593 (2021) (finding GDPR reduced investment in new and emerging technology firms, particularly in data-related ventures); James Campbell, Avi Goldfarb, & Catherine Tucker, Privacy Regulation and Market Structure, 24 J. Econ. & Mgmt. Strat. 47 (2015) (“Consequently, rather than increasing competition, the nature of transaction costs implied by privacy regulation suggests that privacy regulation may be anti-competitive.”).

Continue reading
Antitrust & Consumer Protection

Navigating the AI Frontier, Part I

TOTM The European Union is on the verge of enacting the landmark Artificial Intelligence Act (AI Act), which will—for better or worse—usher in a suite of . . .

The European Union is on the verge of enacting the landmark Artificial Intelligence Act (AI Act), which will—for better or worse—usher in a suite of new obligations, and hidden pitfalls, for individuals and firms trying to navigate the development, distribution, and deployment of software.

Over the coming months, we will be delving into the nuances of the proposed text, aiming to illuminate the potential challenges and interpretive dilemmas that lie ahead. This series will serve as a guide to understanding and preparing for the AI Act’s impact, ensuring that stakeholders are well-informed and equipped to adapt to the regulatory challenges on the horizon.

Read the full piece here.

Continue reading
Innovation & the New Economy

Artificial Intelligence and IFCs

Popular Media Artificial intelligence (AI) is transforming the financial services industry. For tax neutral IFCs such as Cayman, Bermuda, and Jersey, it has the potential to increase . . .

Artificial intelligence (AI) is transforming the financial services industry. For tax neutral IFCs such as Cayman, Bermuda, and Jersey, it has the potential to increase competitiveness and facilitate economic diversification. The benefits could be enormous, but to realise this potential, jurisdictions will have to be open to the new technology. Two factors underpin such openness. First, enabling businesses to access the skills to ensure that AI can be implemented successfully and appropriately. Second, avoiding excessively prescriptive and precautionary restrictions on the development and use of AI.

Read the full piece here.

Continue reading
Financial Regulation & Corporate Governance

Kristian Stout on Artificial Intelligence and Copyright

Presentations & Interviews ICLE Director of Innovation Policy Kristian Stout joined fellow panelists Timothy B. Lee and Pamela Samuelson and moderator Brent Skorup to discuss the emerging legal . . .

ICLE Director of Innovation Policy Kristian Stout joined fellow panelists Timothy B. Lee and Pamela Samuelson and moderator Brent Skorup to discuss the emerging legal issues surrounding artificial intelligence and its use of works protected under copyright law on a recent episode of the Federalist Society Regulatory Transparency Project’s Fourth Branch Podcast. The full episode is embedded below.

Continue reading
Intellectual Property & Licensing

The Biden Executive Order on AI: A Recipe for Anticompetitive Overregulation

TOTM The Biden administration’s Oct. 30 “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence” proposes to “govern… the development and . . .

The Biden administration’s Oct. 30 “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence” proposes to “govern… the development and use of AI safely and responsibly” by “advancing a coordinated, Federal Government-wide approach to doing so.” (Emphasis added.)

This “all-of-government approach,” which echoes the all-of-government approach of the 2021 “Executive Order on Competition” (see here and here), establishes a blueprint for heightened regulation to deal with theorized problems stemming from the growing use of AI by economic actors. As was the case with the competition order, the AI order threatens to impose excessive regulatory costs that would harm the American economy and undermine competitive forces. As such, the order’s implementation warrants close scrutiny.

Read the full piece here.

Continue reading
Innovation & the New Economy

Biden’s AI Executive Order Sees Dangers Around Every Virtual Corner

TOTM Here in New Jersey, where I live, the day before Halloween is commonly celebrated as “Mischief Night,” an evening of adolescent revelry and light vandalism . . .

Here in New Jersey, where I live, the day before Halloween is commonly celebrated as “Mischief Night,” an evening of adolescent revelry and light vandalism that typically includes hurling copious quantities of eggs and toilet paper.

It is perhaps fitting, therefore, that President Joe Biden chose Oct. 30 to sign a sweeping executive order (EO) that could itself do quite a bit of mischief. And befitting the Halloween season, in proposing this broad oversight regime, the administration appears to be positively spooked by the development of artificial intelligence (AI).

The order, of course, embodies the emerging and now pervasive sense among policymakers that they should “do something” about AI; the EO goes so far as to declare that the administration feels “compelled” to act on AI. It largely directs various agencies to each determine how they should be involved in regulating AI, but some provisions go further than that. In particular, directives that set new reporting requirements—while ostensibly intended to forward the reasonable goal of transparency—could end up doing more harm than good.

Read the full piece here.

Continue reading
Innovation & the New Economy

ICLE Comments on Artificial Intelligence and Copyright

Regulatory Comments Introduction We thank you for the opportunity to comment on this important notice of inquiry (NOI)[1] on artificial intelligence (AI) and copyright. We appreciate the . . .

Introduction

We thank you for the opportunity to comment on this important notice of inquiry (NOI)[1] on artificial intelligence (AI) and copyright. We appreciate the U.S. Copyright Office undertaking a comprehensive review of the policy and copyright-law issues raised by recent advances in generative AI systems. This NOI covers key areas that require attention, from legal questions regarding infringement and fair use, to questions about how policy choices could shape opportunities for creators and AI producers to engage in licensing.

At this early date, AI systems have already generated some incredible visual art and impressive written texts, as well as a good deal of controversy. Some artists have banded together as part of an anti-AI campaign;[2] lawsuits have been filed;[3] and policy experts have attempted to think through the various legal questions raised by these machine-learning systems.

The debates over the role of AI in creative industries have particular salience for intellectual-property rights. Copyright is notoriously difficult to protect online, and the emergence of AI may exacerbate that difficulty. AI systems also potentially pose an additional wrinkle: it is at least arguable that the outputs they produce can themselves be considered unique creations. There are, of course, other open questions whose answers are relevant here, not the least being whether it is fair to assert that only a human can be “creative” (at least, so far).[4]

But leaving these questions aside, we can say that at least some AI systems produce unique outputs and are not merely routinely duplicating other pieces of work in a digital equivalent of collage. That is, at some level, the machines are engaged in a rudimentary sort of “learning” about how humans arrange creative inputs when generating images, music, or written works. The machines appear to be able to reconstruct this process and produce new sets of words, sounds, or lines and colors that conform to the patterns found in human art, in at least a simulacrum of “creativity.”

But that conclusion isn’t the end of the story. Even if some of these AI outputs are unique and noninfringing, the way that AI systems learn—by ingesting massive quantities of existing creative work—raises a number of thorny copyright-law issues. Indeed, some argue that these systems inherently infringe copyright during the learning phase and that, as discussed below, such processes may not survive a “fair use” analysis.

But nor is that assertion the end of the analysis. Rather, it raises the question of whether applying existing doctrine in this novel technological context yields the best results for society. Moreover, it heightens the need for a comprehensive analytical framework to help parse these questions.

A.            The Law & Economics of Copyright and AI

Nearly all would agree that it is crucial that law and public policy strike the appropriate balance between protecting creators’ existing rights and enabling society to enjoy the potentially significant benefits that could arise from the development of AI systems. Indeed, the subject is often cast as a dramatic conflict between creative professionals struggling to make ends meet and innovative firms working to provide cutting-edge AI technology. For the moment, however, it is likely more important to determine the right questions to ask and the proper analytical framework to employ than it is to identify any precise balancing point.

What is important to remember is that copyright policy is foremost economic in nature and “can be explained as a means for promoting efficient allocation of resources.”[5] That is to say, the reason that property rights in creative expression exist is to guarantee the continued production of such works.[6] The fundamental tradeoff in copyright policy is between the costs of limiting access to creative works, and the value obtained by encouraging production of such works.[7] The same applies in the context of AI: identifying the key tradeoffs and weighing the costs and benefits of restricting access to protected works by the producers (and users) of AI systems.[8]

This entails examining the costs and benefits of relatively stronger or weaker forms copyright protection in terms of their effects on both incentives and access, and as they relate to both copyright holders and AI-system developers. It also requires considering where the transaction costs should be allocated for negotiating access to both copyright and, as discussed infra,[9] the use of name/image/likeness, as well as how those allocations are likely to shape outcomes.

At root, these questions center on how to think about the property rights that limit access to protected works and, possibly even more importantly, how to assign new property rights governing the ability to control the use of a name/image/likeness. As we know from the work of the late Nobel laureate Ronald Coase, the actual demarcation of rights affects parties’ abilities to negotiate superior solutions.[10] The development of nuisance law provides a good example of the problem at hand. When a legal regime provides either strict liability or no-liability rules around pollution, parties have little incentive to minimize harmful conduct:

The factory that has the absolute right to pollute will, if transaction costs are prohibitive, have no incentives to stop (or reduce) pollution even if the cost of stopping would be much less than the cost of pollution to the homeowners. Conversely, homeowners who have an absolute right to be free from pollution will, if transaction costs are prohibitive, have no incentive to take steps of their own to reduce the effects of pollution even if the cost to them of doing so (perhaps by moving away) is less than the cost to the factory of not polluting or of polluting less.[11]

As Coase observed, this class of problem is best regarded as reciprocal in nature, and the allocation of rights matters in obtaining an efficient outcome. This is necessarily so because, when fully considered, B’s ability to restrain A from the pollution-generating activity can itself be conceived of as another kind of harm that B can impose on A. Therefore, the problem requires a balancing of the relative harms generated by both A and B in exercising conflicting claims in a particular context.

When thinking about how to minimize harms—whether from pollution or other activity that generates social costs (which is to say, nearly every activity)—the aim is to decide whether “the gain from preventing the harm is greater than the loss which would be suffered elsewhere as a result of stopping the action which produces the harm.”[12] Theoretically, in a world without transaction costs, even assignments of no-liability or strict-liability rules could be bargained around. But we do not live in such a world.[13] Thus, “[i]n a world in which there are costs of rearranging the rights established by the legal system [common law and statutory assignments of liability] are, in effect, making a decision on the economic problem and determining how resources are to be employed.”[14]

While pollution rules, unlicensed uses of intellectual property, and a host of other activities subject to legal sanction are not typically framed as resource-allocation decisions, it is undeniable that they do have this character. This is true even where legislation attempts to correct deficiencies in the system. We experience a form of blindness when we focus on correcting what may be rightly perceived as problems in a liability regime. Such analysis tends to concentrate attention on particular deficiencies of the system and to nourish the belief that any measure that removes the deficiency is necessarily desirable. It diverts attention from other changes inevitably associated with the corrective measure—changes that may well produce more harm than the original deficiency.[15]

All of this is to say that one solution to the costs generated by the need for AI systems to process a massive corpus of expensive, copyright-protected material is neither to undermine property rights, nor to make AI impossible, but to think about how new property rights could make the system work. It may be that some entirely different form or allocation of property right would facilitate bargaining between rightsholders and AI creators, optimizing resource allocation in a way the existing doctrinal regime may not be able to.

A number of other questions flow from this insight into the allocative nature of copyright. How would the incentives for human creators change under different copyright rules for AI systems, or in the face of additional rights? And how would access to copyrighted works for AI training change with different rules, and what effects would that access have on AI innovation?

Above all, our goal today should be to properly frame the AI and copyright debate by identifying tradeoffs, quantifying effects (where possible), and asking what rules best serve the overall objectives of the copyright system and the social goal of encouraging AI innovation. The best chance of striking the right balance will come from a rigorous framing of the questions and from the use of economic analysis to try to answer them.

B.            Copyright Law and AI: Moving Forward

As the Copyright Office undertakes this inquiry, it is important to recognize that, regardless of how the immediate legal questions around AI and copyright are resolved, the growing capabilities and adoption of generative AI systems will likely necessitate some changes in the long term.

The complex questions surrounding the intersection of AI and copyright law admit reasonable arguments on both sides. But AI is here to stay, regardless, and if copyright law is applied in an unduly restrictive manner that substantially hinders socially beneficial AI innovation, it could provoke a broader public-policy backlash that does more to harm copyright’s ability to protect creative works than it does to stanch AI’s ability to undermine it. Copyright law risks being perceived as an obstruction to technological progress if it is used preemptively to kill AI in the cradle. Such an outcome could galvanize calls for recalibrating copyright’s scope and protections in the name of the public interest.

This illustrates the precarious balancing act that copyright law faces in the wake of rapidly evolving technologies like AI. Aggressive copyright restrictions that curtail AI development could instigate a public-policy counter-reaction before Congress and the courts that ultimately undermines copyright’s objectives. The judicious course is to adapt copyright law cautiously to enable AI’s responsible evolution, while resolutely preserving the incentives for human creativity.

In the remainder of this analysis, we offer our perspective on the likely outcomes of the AI-copyright issues raised in this NOI, given the current state of the law. These assessments reflect our perspective formed through the rigorous application of established copyright principles and precedent to the novel technological context of generative AI systems. Reasonable arguments rooted in existing doctrine could be made to support different conclusions. We submit these comments not as definitive predictions or normative preferences, but rather as informed appraisals of how courts may analyze AI under present copyright law, absent legislative intervention.

We appreciate the Copyright Office starting this process to modernize copyright law for the AI age. This inquiry is an important first step, but openness to further evolution will be key to promoting progress in both AI and the arts. We believe an open, evidence-based discussion of these issues will lead to balanced solutions that uphold copyright’s constitutionally mandated purpose, while allowing responsible AI innovation for the public benefit.

II.            The Training of AI Systems and the Applicability of Fair Use

In the NOI, the Copyright Offices asks: “[u]nder what circumstances would the unauthorized use of copyrighted works to train AI models constitute fair use?”[16]

To answer this question, it would be useful to first briefly walk through a high-level example of how AI systems work, in order to address the most relevant points of contact between AI systems and copyright law.

A.            A Brief Technical Description of AI Training

AI-generated content is not a single “thing,” but a collection of differing processes, each with different implications for the law. For the purposes of this discussion, we will discuss image generation using “generated adversarial networks” (GANs) and diffusion models. Although different systems and different types of content generation will vary, the basic concepts discussed below are nonetheless useful at a general level.[17]

A GAN is a type of machine-learning model that consists of two parts: a generator and a discriminator.[18] The generator is trained to create new images that look like they come from a particular dataset, while the discriminator is trained to distinguish the generated images from real images in its original dataset.[19] The two parts are trained together in an adversarial manner, with the generator trying to produce images that can fool the discriminator and the discriminator trying to correctly identify the generated images.[20]

A diffusion model, by contrast, analyzes the distribution of information in an image, as noise is progressively added to it.[21] This kind of algorithm analyzes characteristics of sample images, like the distribution of colors or lines, in order to understand what counts as an accurate representation of a subject (i.e., what makes a picture of a cat look like a cat, and not like a dog).[22]

For example, in the generation phase, diffusion-based systems start with randomly generated noise, and work backward in “denoising” steps to essentially “see” shapes:

The sampled noise is predicted so that if we subtract it from the image, we get an image that’s closer to the images the model was trained on (not the exact images themselves, but the distribution – the world of pixel arrangements where the sky is usually blue and above the ground, people have two eyes, cats look a certain way – pointy ears and clearly unimpressed).[23]

While it is possible that some implementations might be designed in a way that saves copies of the training images,[24] for at least some systems, once the network is trained using these techniques, it will not need to rely on saved copies of input work in order to produce outputs. The models that are produced during training are, in essence, instructions to a different piece of software about how to start with a prompt from a user, a palette of pure noise, and progressively “discover” signal in that image until some new image emerges.

B.            Fair Use

The creator of some of the most popular AI tools, OpenAI, is not shy about their use of protected works in the training phase of the algorithms. In comments to the U.S. Patent and Trademark Office (PTO), OpenAI noted that:

Modern AI systems require large amounts of data. For certain tasks, that data is derived from existing publicly accessible “corpora”… of data that include copyrighted works. By analyzing large corpora (which necessarily involves first making copies of the data to be analyzed), AI systems can learn patterns inherent in human-generated data and then use those patterns to synthesize similar data which yield increasingly compelling novel media in modalities as diverse as text, image, and audio. (emphasis added).[25]

Thus, at the training stage, the most popular forms of AI systems require making copies of existing works. And where that material is either not in the public domain or is not licensed, an infringement can occur. Thus, the copy must not be infringing (say, because it is transient), or some affirmative defense is needed to excuse the infringement. Toward this end, OpenAI believes that this use should qualify as fair use,[26] as do most or all the other major producers of generative AI systems.[27]

But as OpenAI has framed the fair-use analysis, it is not clear that these uses should qualify. There are two major questions in this respect: will the data used to train these systems count as “copies” under the Copyright Act, and, if so, is the use of these “copies” sufficiently “transformative” to qualify for the fair-use defense?

1.              Are AI systems being trained with ‘copies’ of protected works?

Section 106 of the Copyright Act grants the owner of a copyright the exclusive right “to reproduce… copyrighted work in copies” and to authorize others to do so.[28] If an AI system makes a copy of a file to a computer during training, this would likely constitute a prima facie violation of the copyright owner’s exclusive right of reproduction under Section 106. This is fairly straightforward.

But what if the “copy” is “transient” and/or only partial pieces of content are used in the training? For example, what if a training program merely streamed small bits of a protected work into temporary memory as part of its training, and retained no permanent copy?

As the Copyright Office has previously observed, even temporary reproductions of a work in a computer’s memory can constitute “copies” under the Copyright Act.[29] Critically, this includes even temporary reproductions made as part of a packet-switching network transmission, where a particular file is broken into individual packets, because the packets can be reassembled into substantial portions or even entire works.[30] On the topic of network-based transmission, the Copyright Office further observed that:

Digital networks permit a single disk copy of a work to meet the demands of many users by creating multiple RAM copies. These copies need exist only long enough to be perceived (e.g., displayed on the screen or played through speakers), reproduced or otherwise communicated (e.g., to a computer’s processing unit) in order for their economic value to be realized. If the network is sufficiently reliable, users have no need to retain copies of the material. Commercial exploitation in a network environment can be said to be based on selling a right to perceive temporary reproductions of works.[31]

This is a critical insight that translates well to the context of AI training. The “transience” of the copy matters with respect to the receiver’s ability to perceive the work in a way that yields commercial value. Under this reasoning, the relevant locus of analysis is on the AI system’s ability to “perceive” a work for the purposes of being trained to “understand” the work. In this sense, you could theoretically find the existence of even more temporary copies than that necessary for human perception to implicate the reproduction right.

Even where courts have been skeptical of extending the definition of “copy” to “fleeting” copies in computer memory, this underlying logic is revealed. In Cartoon Network LP, LLLP v. CSC Holdings, Inc., 536 F.3d 121 (2008), the 2nd U.S. Circuit Court of Appeals had to determine whether buffered media sent to a DVR device was too “transient” to count as a “copy”:

No bit of data remains in any buffer for more than a fleeting 1.2 seconds. And unlike the data in cases like MAI Systems, which remained embodied in the computer’s RAM memory until the user turned the computer off, each bit of data here is rapidly and automatically overwritten as soon as it is processed. While our inquiry is necessarily fact-specific, and other factors not present here may alter the duration analysis significantly, these facts strongly suggest that the works in this case are embodied in the buffer for only a “transitory” period, thus failing the duration requirement.[32]

In Cartoon Network, the court acknowledged both that the duration analysis was fact-bound, and also that the “fleeting” nature of the reproduction was important. “Fleeting” is a relative term, based on the receiver’s capacities. A ball flying through the air may look “fleeting” to a human observer, but may appear to go much more cognizable to a creature with faster reaction time, such as a house fly. So, too, with copies of a work in a computer’s memory and the ability to “perceive” what is fixed in a buffer: what may be much too quick for a human to perceive may very well be within an AI system’s perceptual capabilities.

Therefore, however the training copies are held, there is a strong possibility that a court will find them to be “copies” for the purposes of the reproduction right—even with respect to partial copies that exist for very small amounts of time.

2.              The purpose and character of using protected works to train AI systems

Fair use provides for an affirmative defense against infringement when the use is, among other things, “for purposes such as criticism, comment, news reporting, teaching…, scholarship, or research.”[33] When deciding whether a fair-use defense is applicable, a court must balance a number of factors:

  1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
  2. the nature of the copyrighted work;
  3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
  4. the effect of the use upon the potential market for or value of the copyrighted work.[34]

The fair-use defense that AI creators have advanced is rooted in the first factor: the nature and character of the use. Although a full analysis of all the factors is ultimately necessary, analysis of the first factor is sufficiently complicated to warrant full attention here. In particular, the complex issue at hand is whether uses of protected works to train AI systems are sufficiently “transformative” or not.[35]

Whether the use of a copyrighted work to train an AI is “transformative” is certainly a novel question, but it is one that will likely be answered in light of an observation the U.S. Supreme Court made in Campbell v. Acuff Rose Music:

[W]hen a commercial use amounts to mere duplication of the entirety of an original, it clearly “supersede[s] the objects,”… of the original and serves as a market replacement for it, making it likely that cognizable market harm to the original will occur… But when, on the contrary, the second use is transformative, market substitution is at least less certain, and market harm may not be so readily inferred.[36]

Moreover, “[t]he word ‘transformative’ cannot be taken too literally as a sufficient key to understanding the elements of fair use. It is rather a suggestive symbol for a complex thought, and does not mean that any and all changes made to an author’s original text will necessarily support a finding of fair use.”[37] A key question, then, is whether training AI systems on copyrighted works amounts to a mere “duplication of the entirety of an original” or is sufficiently “transformative” to support a fair-use defense. As noted above, OpenAI believes that its use is transformative. According to its comments:

Training of AI systems is clearly highly transformative. Works in training corpora were meant primarily for human consumption for their standalone entertainment value. The “object of the original creation,” in other words, is direct human consumption of the author’s ?expression.? Intermediate copying of works in training AI systems is, by contrast, “non-expressive” the copying helps computer programs learn the patterns inherent in human-generated media. The aim of this process—creation of a useful generative AI system—is quite different than the original object of human consumption. The output is different too: nobody looking to read a specific webpage contained in the corpus used to train an AI system can do so by studying the AI system or its outputs. The new purpose and expression are thus both highly transformative.[38]

This framing, however, works against OpenAI’s interests. As noted above, and reinforced in the immediately preceding quote, generative AI systems are made of at least two distinct pieces. The first is a piece of software that ingests existing works and creates a file that can serve as instructions to the second piece of software. The second piece of software takes the output of the first and can produce independent results. Thus, there is a clear discontinuity in the process whereby the ultimate work created by the system is disconnected from the creative inputs used to train the software.

Therefore, the protected works are arguably ingested into the first part of the system “for their standalone entertainment value.” That is to say, the goal of copying and showing a protected work to an AI system is for the analog of “direct human consumption of the author’s expression” in order for the system to learn about that expression.

The software is learning what counts as “standalone entertainment value” and therefore the works must be used in those terms. Surely, a computer is not sitting on a couch and surfing for its pleasure. But it is solely for the very “standalone entertainment value” that the first piece of software is being shown copyrighted material. By contrast, parody or “remixing” uses incorporate a work into some secondary expression that directly transforms the input. The way these systems work is to learn what makes a piece entertaining and then to discard that piece altogether. Moreover, this use for the art qua art most certainly interferes with the existing market, insofar as this use is in lieu of reaching a licensing agreement with rightsholders.

A good analogy is art students and art textbooks. Art students view protected works in an art textbook in order to learn how to reproduce the styles contained therein. The students would not be forgiven for pirating the textbooks merely because they intend to go on to make new paintings. They would still be liable for copyright infringement if they used unlicensed protected works as part of their education.

The 2nd U.S. Circuit Court of Appeals dealt with a case that operates similarly to this dynamic. In American Geophysical Union v. Texaco, 60 F.3d 913 (2d Cir. 1994), the 2nd Circuit considered whether Texaco’s photocopying of scientific articles produced by the plaintiffs qualified for a fair-use defense. Texaco employed between 400 and 500 research scientists and, as part of supporting their work, maintained subscriptions to a number of scientific journals.[39]

It was common practice for Texaco’s scientists to photocopy entire articles and save them in a file.[40] The plaintiffs sued for copyright infringement.[41] Texaco asserted that photocopying by its scientists for the purposes of furthering scientific research—that is to train the scientists on the content of the journal articles—should count as a fair use. The argument was, at least in part, that this was sufficiently “transformative,” because the scientists were using that knowledge to invent new products.[42] The 2nd Circuit disagreed:

The “transformative use” concept is pertinent to a court’s investigation under the first factor because it assesses the value generated by the secondary use and the means by which such value is generated. To the extent that the secondary use involves merely an untransformed duplication, the value generated by the secondary use is little or nothing more than the value that inheres in the original. Rather than making some contribution of new intellectual value and thereby fostering the advancement of the arts and sciences, an untransformed copy is likely to be used simply for the same intrinsic purpose as the original, thereby providing limited justification for a finding of fair use….[43]

The 2nd Circuit thus observed that copies of the scientific articles were made solely to consume the material itself. AI developers often make an argument analogous to that made by Texaco: that training AI systems surely advances scientific research, and therefore fosters the “advancement of the arts and sciences.” But in American Geophysical Union, the initial copying of copyrighted content, even where it was ultimately used for the “advancement of the arts and sciences,” was not held to be sufficiently “transformative.”[44] The case thus stands for the proposition that one cannot merely identify a social goal down that would be advanced at some future date in order to permit an exception to copyright protection. As the court put it:

[T]he dominant purpose of the use is a systematic institutional policy of multiplying the available number of copies of pertinent copyrighted articles by circulating the journals among employed scientists for them to make copies, thereby serving the same purpose for which additional subscriptions are normally sold, or… for which photocopying licenses may be obtained.[45]

The use itself must be transformative and different, and copying is not transformative merely because it may be used as an input into a later transformative use. By the same token, therefore, it seems likely that where an AI system ingests (copies) copyrighted works, that use is similarly not transformative, despite its ultimate use as an input in the creation of other original works.

Comparing the American Geophysical Union analysis with the search-engine “snippets” and “thumbnails” cases provides a useful comparison relevant to the AI analysis. In Kelly v. Arriba Soft Corp., 336 F.3d 811 (9th Cir. 2002), the 9th U.S. Circuit Court of Appeals ruled that a search engine’s creation of thumbnail images from original copies was a transformative fair use.[46] Arriba’s search-engine crawler made full-sized copies of Kelly’s images and stored them temporarily on Arriba’s server to generate thumbnail versions. After the thumbnails were created, the full-sized originals were deleted. The thumbnails were used to facilitate Arriba’s image-based search engine. In reaching its fair-use conclusion, the 9th Circuit opined that:

Arriba’s use of Kelly’s images promotes the goals of the Copyright Act and the fair use exception. The thumbnails do not stifle artistic creativity because they are not used for illustrative or artistic purposes and therefore do not supplant the need for the originals.[47]

Further, although “Arriba made exact replications of Kelly’s images, the thumbnails were much smaller, lower-resolution images that served an entirely different function than Kelly’s original images.”[48]

The court found it important that the search engine did not use the protected works for their intended “aesthetic experience,” but rather for the purpose of constructing a search index.[49] Indeed, the entire point of a search engine is not to “supersede” the original, but in many or most cases to provider users an efficient means to find that original online.[50]

The court discussed, but only briefly, the benefit to the public of Arriba’s transformative use,[51] noting that “[Arriba’s thumbnails] benefit the public by enhancing information-gathering techniques on the internet.”[52] Five years later, in Perfect 10 Inc. v. Amazon.com Inc., 487 F.3d 701 (2007), the 9th Circuit expanded on this question somewhat.[53] There, in holding that the novelty of the use was of crucial importance to the analysis,[54] the court also stressed that the value of that use was a function of its newness:

[A] search engine provides social benefit by incorporating an original work into a new work, namely, an electronic reference tool. Indeed, a search engine may be more transformative than a parody [the use at issue in Campbell] because a search engine provides an entirely new use for the original work, while a parody typically has the same entertainment purpose as the original work.[55]

Indeed, even in light of the commercial nature of Google’s use of copyrighted content in its search engine, its significant public benefit carried the day: “We conclude that the significantly transformative nature of Google’s search engine, particularly in light of its public benefit, outweighs Google’s superseding and commercial uses of the thumbnails in this case.”[56] And, of particular relevance to these questions in the context of AI, the court in Perfect 10 went on to “note the importance of analyzing fair use flexibly in light of new circumstances.”[57]

Ultimately, the Perfect 10 decision tracked Kelly fairly closely on the rest of the “transformativeness” analysis in finding fair use, because “[a]lthough an image may have been created originally to serve an entertainment, aesthetic, or informative function, a search engine transforms the image into a pointer directing a user to a source of information.”[58]

The core throughline in this line of cases is the question of whether a piece of content is being used for its expressive content, weighed against the backdrop of whether the use is for some new (and, thus, presumptively valuable) purpose. In Perfect 10 and Kelly, the transformative use was the creation of a search index.

“Snippets” fair-use cases track a similar line of reasoning. For example, in Authors Guild v. Google Inc., 804 F.3d 202 (2d Cir. 2015), the 2nd Circuit ruled that Google’s use of “snippets” of copyrighted books in its Library Project and Google Books website was a “transformative” fair use.[59] Holding that the “snippet view” of books digitized as part of the Google Books project did not constitute an effectively competing substitute to the original works, the circuit court noted that copying for the purpose of “criticism” or—as in that case—copying for the purpose of “provision of information about” the protected work, “tends most clearly to satisfy Campbell’s notion of the ‘transformative’ purpose.”[60]

Importantly, the court emphasized the importance of the public-benefit aspect of transformative uses: “[T]ransformative uses tend to favor a fair use finding because a transformative use is one that communicates something new and different from the original or expands its utility, thus serving copyright’s overall objective of contributing to public knowledge.”[61]

Underscoring the idea that the “transformativeness” analysis weighs whether a use is merely for expressive content against the novelty/utility of the intended use, the court observed:

Google’s division of the page into tiny snippets is designed to show the searcher just enough context surrounding the searched term to help her evaluate whether the book falls within the scope of her interest (without revealing so much as to threaten the author’s copyright interests). Snippet view thus adds importantly to the highly transformative purpose of identifying books of interest to the searcher.[62]

Thus, the absence of use of the work’s expressive content, coupled with a fairly circumscribed (but highly novel) use was critical to the outcome.

The entwined questions of transformative use and the public benefit it confers are significantly more complicated in the AI context, however. Unlike the incidental copying involved in search-engine indexing or thumbnails, training generative AI systems directly leverages copyrighted works for their expressive value. In the Google Books and Kelly cases, the defendant systems extracted limited portions of works or down-sampled images solely to identify and catalog their location for search purposes. The copies enabled indexing and access, and they expanded public knowledge through a means unrelated to the works’ protected aesthetics.

But in training AI models on copyrighted data, the systems necessarily parse the intrinsic creative expression of those works. The AI engages with the protected aesthetic elements themselves, not just superficial markers (like title, length, location on the internet, etc.), in order to internalize stylistic and compositional principles. This appropriates the heart of the works’ copyright protection for expressive ends, unlike the more tenuous connections in search systems.

The AI is thus “learning” directly from the protected expression in a manner akin to a human student studying an art textbook, or like the scientists learning from the journals in American Geophysical Union. The subsequent AI generations are built from mastery of the copyrighted training materials’ creative expression. Thus, while search-engine copies only incidentally interact with protected expression to enable unrelated innovation, AI training is predicated on excavating the protected expression itself to fuel iterative creation. These meaningfully different purposes have significant fair-use implications.

This functional difference is, as noted, central to the analysis of a use’s “purpose and character.” Indeed, “even making an exact copy of a work may be transformative so long as the copy serves a different function than the original work.”[63] But the benefit to the public from the new use is important, as well, particularly with respect to the possible legislative response that a restrictive interpretation of existing doctrine may engender.

If existing fair-use principles prohibit the copying required for AI, absent costly item-by-item negotiation and licensing, the transaction costs could become prohibitive, thwarting the development of technologies that promise great public value.[64] Copyright law has faced similar dilemmas before, where the transaction costs of obtaining permission for socially beneficial uses could frustrate those uses entirely.[65] In such cases, we have developed mechanisms like compulsory licensing to facilitate the necessary copying, while still attempting to compensate rightsholders. An unduly narrow fair-use finding for AI training could spur calls for similar interventions in service of enabling AI progress.

In other words, regardless of the veracity of the above conclusion that AI’s use of copyrighted works may not, in fact, serve a different function than the original, courts and legislators may be reluctant to allow copyright doctrine to serve as an absolute bar against self-evidently valuable activity like AI development. Our aim should be to interpret or recalibrate copyright law to permit such progress while upholding critical incentives for creators.

C.            Opt-In vs. Opt-Out Use of Protected Works

The question at the heart of the prior discussion—and, indeed, at the heart of the economic analysis of copyright—is whether the transaction costs that accompany requiring express ex ante permission for the use of protected works are so high that they impedes socially beneficial conduct whose value would outweigh the social cost of allowing permissionless and/or uncompensated use.[66] The NOI alludes to this question when it asks: “Should copyright owners have to affirmatively consent (opt in) to the use of their works for training materials, or should they be provided with the means to object (opt out)?”[67]

This is a complex problem. Given the foregoing thoughts on fair use, it seems quite possible that, at present, the law requires creators of AI systems to seek licenses for protected content, or else must resort to public-domain works for training. Given the volume of copyrighted works that AI developers currently use to train these systems, such requirements may be broadly infeasible.

On one hand, requiring affirmative opt-in consent from copyright holders imposes significant transaction costs on AI-system developers to identify and negotiate licenses for the vast amounts of training data required. This could hamper innovation in socially beneficial AI systems. On the other hand, an opt-out approach shifts more of the transaction-cost burden to copyright holders, who must monitor and object to unwanted uses of their works. This raises concerns about uncompensated use.

Ultimately, the question is where the burden should lie: with AI-system developers to obtain express consent, or with copyright holders to monitor and object to uses? Requiring some form of consent may be necessary to respect copyright interests. Yet an opt-out approach may strike the right balance, by shifting some of the burden back to AI developers while avoiding the infeasibly high transaction costs of mandatory opt-in consent. The optimal approach likely involves nuanced policymaking to balance these competing considerations. Moreover, as we discuss infra, the realistic outcome is most likely going to require rethinking the allocation of property rights in ways that provide for large-scale licensing. Ideally, this could be done through collective negotiation, but perhaps at a de minimis rate, while allowing creators to bargain for remuneration on the basis of other rights, like a right of publicity or other rights attached to the output of AI systems, rather than the inputs.[68]

1.              Creator consent

Relatedly, the Copyright Office asks: “If copyright owners’ consent is required to train generative AI models, how can or should licenses be obtained?”[69]

Licensing markets exist, and it is entirely possible that major AI developers and large groups of rightsholders can come to mutually beneficial terms that permit a sufficiently large body of protected works to be made available as training data. Something like a licensing agency for creators who choose to make their works available could arise, similar to the services that exist to provide licensed music and footage for video creators.[70] It is also possible for some to form collective-licensing organizations to negotiate blanket permissions covering many works.

It’s important to remember that our current thinking is constrained by our past experience. All we know today are AI models trained on vast amounts of unlicensed works. It is entirely possible that, if firms were required to seek licenses, unexpected business models would emerge to satisfy both sides of the equation.

For example, an AI firm could develop its own version of YouTube’s ContentID, which would allow creators to control when their work is used in AI training. For some well-known artists, this could be negotiated with an upfront licensing fee. On the user side, any artist who has opted in could then be selected as a “style” for the AI to emulate—triggering a royalty payment to the artist when a user generates an image or song in that style. Creators could also have the option of removing their influence from the system if they so desire.

Undoubtedly, there are other ways to structure the relationship between creators and AI systems  that would facilitate creators’ monetization of the use of their work in AI systems, including legal and commercial structures that create opportunities for both creators and AI firms to succeed.

III.          Generative AI Outputs: Protection of Outputs and Outputs that Infringe

The Copyright Office asks: “Under copyright law, are there circumstances when a human using a generative AI system should be considered the ‘author’ of material produced by the system?”[71]

Generally speaking, we see no reason why copyright law should be altered to afford protection to purely automatic creations generated by AI systems. That said, when a human makes a nontrivial contribution to generative AI output—such as editing, reframing, or embedding the AI-generated component within a larger work—the resulting work should qualify for copyright protection.

Copyright law centers on the concept of original human authorship.[72] The U.S. Constitution expressly limits copyright to “authors.”[73] As of this writing, however, generative AI’s capacities do not rise to the level of true independent authorship. AI systems remain tools that require human direction and judgment.[74] As such, when a person provides the initial prompt or framing, makes choices regarding the iterative development of the AI output, and decides that the result is satisfactory for inclusion in a final work, they are fundamentally engaging in creative decision making that constitutes authorship under copyright law.

As Joshua Gans has observed of recent Copyright Review Board decisions:

Trying to draw some line between AI and humans with the current technology opens up a massive can of worms. There is literally no piece of digital work these days that does not have some AI element to it, and some of these mix and blur the lines in terms of what is creative and what is not. Here are some examples:

A music artist uses AI to denoise a track or to add an instrument or beat to a track or to just get a composition started.

A photographer uses Photoshop or takes pictures with an iPhone that already uses AI to focus the image and to sort a burst of images into one that is appropriate.

A writer uses AI to prompt for some dialogue when stuck at some point or to suggest a frame for writing a story.[75]

Attempting to separate out an “AI portion” from the final work, as the Copyright Review Board proposed, fundamentally misunderstands the integrated nature of the human-AI collaborative process. The AI system cannot function without human input, and its output remains raw material requiring human creativity to incorporate meaningfully into a finished product.

Therefore, when a generative AI system is used as part of a process guided by human creative choices, the final work should be protected by copyright, just as a work created using any other artistic tool or collaborator would be. Attenuating copyrightability due to the use of AI would undermine basic copyright principles and fail to recognize the essentially human nature of the creative process.

A.            AI Outputs and Infringement

The NOI asks: “Is the substantial similarity test adequate to address claims of infringement based on outputs from a generative AI system, or is some other standard appropriate or necessary?” (Question 23)

The outputs of AI systems may or may not violate IP laws, but there is nothing inherent in the processes described above that dictates that they must. As noted, the most common AI systems do not save copies of existing works, but merely “instructions” (more or less) on how to create new work that conforms to patterns found by examining existing work. If we assume that a system isn’t violating copyright at the input stage, it’s entirely possible that it can produce completely new pieces of art that have never before existed and do not violate copyright.

They can, however, be made to violate copyrights. For example, these systems can be instructed to generate art, not just in the style of a particular artist, but art that very closely resembles existing pieces. In this sense, it would be making a copy that theoretically infringes. The fact of an AI’s involvement would not change the analysis: just as with a human-created work, if it is substantially similar to a copyrighted work, it may be found infringing.

There is, however, a common bug in AI systems that leads to outputs that are more likely to violate copyright in this way. Known as “overfitting,” the training leg of these AI systems can be presented with samples that contain too many instances of a particular image.[76] This leads to a dataset that contains too much information about the specific image, such that—when the AI generates a new image—it is constrained to producing something very close to the original. Similarly, there is evidence that some AI systems are “memorizing” parts of protected books.[77] This could lead to AI systems repeating copyright-protected written works.

1.              The substantial-similarity test

The substantial-similarity test remains functionally the same when evaluating works generated using AI. To find “substantial similarity,” courts require evidence of copying, as well as an expression that is substantially similar to a protected work.[78] “It is now an axiom of copyright law that actionable copying can be inferred from the defendant’s access to the copyrighted work and substantial similarity between the copyrighted work and the alleged infringement.”[79] In many or most cases, it will arguably be the case that AI systems have access to quite a wide array of protected works that are posted online. Thus, there may not be a particularly high hurdle to determine that an AI system actually copied a protected work.

There is, however, one potential problem for the first prong of this analysis. Models produced during a system’s training process do not (usually) contain the original work, but are the “ideas” that the AI systems generated during training. Thus, where the provenance of works contained in a training corpus is difficult to source, it may not be so straightforward to make inferences about whether a model “saw” a particular work. This is because the “ideas” that the AI “learns” from its training corpus are unprotected under U.S. copyright law, as it is permissible to mimic unprotected elements of a copyrighted work (such as ideas).[80]

Imagine a generative AI system trained on horror fiction. It would be possible for this system to produce a new short story that is similar to one written by Stephen King, but the latent data in the model almost certainly would not violate any copyrights that King holds in his work. The model would contain “ideas” about horror stories, including those learned from an array of authors who were themselves influences on Stephen King, and potentially some of King’s own stories. What the AI system “learns” in this case is the relationship between words and other linguistic particularities that are commonly contained in horror fiction. That is, it has “ideas” about what goes into a horror story, not (theoretically) the text of the horror story itself.

Thus, when demonstrating indirect proof of copying in the case of a Stephen King story, it may pose a difficulty that an AI system has ingested all of H.P. Lovecraft’s work—an author who had a major influence on King. The “ideas” in the model and the output it subsequently produces may, in fact, produce something similar to a Stephen King work, but it may have been constructed largely or entirely on material from Lovecraft and other public-domain horror writers. The problem becomes only more complicated when you realize that this system could also have been trained on public-domain fan fiction written in the style of Stephen King. Thus, for the purposes of the first prong of this analysis, courts may place greater burden on plaintiffs in copyright actions against model producers to demonstrate more than merely that a work was merely available online.

Assuming that plaintiffs are able to satisfy the first prong, once an AI system “expresses” those ideas, that expression could violate copyright law under the second prong of the substantial-similarity test. The second prong inquires whether the final work appropriated the protected original expression.[81] Any similarities in unprotectable ideas, facts, or common tropes are disregarded.[82] So, in both traditional and AI contexts, the substantial-similarity test ultimately focuses on the protected components of creative expression, not surface similarity.

The key determination is whether the original work’s protected expression itself has been impermissibly copied, no matter the process that generated the copy. AI is properly viewed as simply another potential tool that could be used in certain acts of copying. It does not require revisiting settled principles of copyright law.

B.            Direct and Secondary Liability

The NOI asks: “If AI-generated material is found to infringe a copyrighted work, who should be directly or secondarily liable—the developer of a generative AI model, the developer of the system incorporating that model, end users of the system, or other parties?”[83]

Applying traditional copyright-infringement frameworks to AI-generated works poses unique challenges in determining direct versus secondary liability. In some cases, the AI system itself may create infringing content without any direct human causation.

1.              Direct liability

If the end user prompts an AI system in a way that intentionally targets copyrighted source material, they may meet the threshold for direct infringement by causing the AI to reproduce protected expression.[84] Though many AI prompts contain only unprotected ideas, users may sometimes input copyrightable material as the basis for the AI output. For example, a user could upload a copyrighted image and request the AI to make a new drawing based on the sample. In such cases, the user is intentionally targeting copyrighted works and directly “causing” the AI system to reproduce output that is similar. If sufficiently similar, that output could infringe on the protected input. This would be a question of first impression, but it is a plausible reading of available cases.

For example, in CoStar Grp. Inc. v. LoopNet Inc., 373 F.3d 544 (4th Cir. 2004), the 4th U.S. Circuit Court of Appeals had to consider whether an internet service provider (ISP) could be directly liable when third parties reposted copyrighted material owned by the plaintiff. In determining that merely owning the “machine” through which copies were made or transmitted was not enough to “cause” a direct infringement, the court held that:

[T]o establish direct liability under §§ 501 and 106 of the Act, something more must be shown than mere ownership of a machine used by others to make illegal copies. There must be actual infringing conduct with a nexus sufficiently close and causal to the illegal copying that one could conclude that the machine owner himself trespassed on the exclusive domain of the copyright owner. The Netcom court described this nexus as requiring some aspect of volition or causation… Indeed, counsel for both parties agreed at oral argument that a copy machine owner who makes the machine available to the public to use for copying is not, without more, strictly liable under § 106 for illegal copying by a customer. The ISP in this case is an analogue to the owner of a traditional copying machine whose customers pay a fixed amount per copy and operate the machine themselves to make copies. When a customer duplicates an infringing work, the owner of the copy machine is not considered a direct infringer. Similarly, an ISP who owns an electronic facility that responds automatically to users’ input is not a direct infringer.[85]

Implied in the 4th Circuit’s analogy is that, while the owner of a copying machine might not be a direct infringer, a user employing such a machine could be a direct infringer. It’s an imperfect analogy, but a user of an AI system prompting it to create a “substantially similar” reproduction of a protected work could very well be a direct infringer under this framing. Nevertheless, the analogy is inexact, because the user feeds an original into a copying machine in order to make a more-or-less perfect copy of the original, whereas an AI system generates something new but similar. The basic mechanism of using a machine to try to reproduce a protected work, however, remains essentially the same. Whether there is an infringement would be a question of “substantial similarity.”

2.              Secondary liability

As in the case of direct liability, the nature of generative AI makes the secondary-liability determination slightly more complicated, as well. That is, paradoxically, the basis for secondary liability could theoretically arise even where there was no direct infringement.[86]

The first piece of this analysis is relatively easier. If a user is directly liable for infringing a protected work, as noted above, the developer and provider of a generative AI system may face secondary copyright liability. If the AI developer or distributor knows the system can produce infringing outputs, and provides tools or material support that allows users to infringe, it may be liable for contributory infringement.[87] Critically, merely designing a system that is capable of infringing is not enough to find contributory liability.[88]

An AI producer or distributor may also have vicarious liability, insofar as it has the right and ability to supervise users’ activity and a direct financial interest in that activity.[89] AI producers have already demonstrated their ability to control users’ behavior to thwart unwanted uses of the service.[90] Thus, if there is a direct infringement by a user, a plausible claim for vicarious liability could be made so long as there is sufficient connection between the user’s behavior and the producer’s financial interests.

The question becomes more complicated when a user did not direct the AI system to infringe. When the AI generates infringing content without user direction, it’s not immediately clear who would be liable for the infringement.[91] Consider the case where, unprompted by either the user or the AI producer, an AI system creates an output that would infringe under the substantial-similarity test. Assuming that the model has not been directed by the producer to “memorize” the works it ingests, the model itself consists of statistical information about the relationship between different kinds of data. The infringer, in a literal sense, is the AI system itself, as it is the creator of the offending output. Technically, this may be a case of vicarious liability, even without an independent human agent causing the direct infringement.

We know that copyright protection can only be granted to humans. As the Copyright Review Board recently found in a case deciding whether AI-generated outputs can be copyrighted:

The Copyright Act protects, and the Office registers, “original works of authorship fixed in any tangible medium of expression.” 17 U.S.C. § 102(a). Courts have interpreted the statutory phrase “works of authorship” to require human creation of the work.[92]

But can an AI system directly violate copyright? In his Aereo dissent, Justice Clarence Thomas asserted that it was a longstanding feature of copyright law that violation of the performance right required volitional behavior.[93] But the majority disagreed with him, holding that, by running a fully automated system of antennas intended to allow users to view video at home, the system gave rise to direct copyright liability.[94] Thus, implied in the majority’s opinion is the idea that direct copyright infringement does not require “volitional” conduct.

It is therefore plausible that a non-sentient, fully automated AI system could infringe copyright, even if, ultimately, there is no way to recover against the nonhuman agent. That does, however, provide an opportunity for claims of vicarious liability against the AI producer or distributor— at least, where the producer has the power to control the AI system’s behavior and that behavior appears to align with the producer’s financial interests.

3.              Protecting the ‘style’ of human creators

The NOI asks: “Are there or should there be protections against an AI system generating outputs that imitate the artistic style of a human creator (such as an AI system producing visual works ‘in the style of’ a specific artist)?”[95]

At the federal level, one candidate for protection against AI imitating some aspects of a creator’s works can currently be found in trademark law. Trademark law, governed by the Lanham Act, protects names, symbols, and other source identifiers that distinguish goods and services in commerce.[96] Unfortunately, a photograph or likeness, on its own, typically does not qualify for trademark protection, unless it is consistently used on specific goods.[97] Even where there is a likeness (or similar “mark”) used consistently as part of branding a distinct product, many trademark-infringement claims would be difficult to establish in this context, because trademark law does little to protect many aspects of a creator’s work.

Moreover, the Supreme Court has been wary about creating a sort of “mutant copyright” in cases that invoke the Lanham Act as a means to enforce a sort of “right of attribution,” which would potentially give creators the ability to control the use of their name in broader contexts.[98] In this context, the Court has held that the relevant parts of the Lanham Act were not designed to “protect originality or creativity,”[99] but are focused solely on “actions like trademark infringement that deceive consumers and impair a producer’s goodwill.”[100]

In many ways, there is a parallel here to the trademark cases involving keyword bidding in online ads. At a high level, search engines and other digital-advertising services do not generally infringe trademark when they allow businesses to purchase ads triggered by a user’s search for competitor trademarks (i.e., rivals’ business names).[101] But in some contexts, this can be infringing—e.g., where the use of trademarked terms in combination with advertising text can mislead consumers about the origin of a good or service.[102]

Thus, the harm, when it arises, would not be in a user asking an AI system to generate something “in the style of” a known creator, but when that user subsequently seeks to release a new AI-generated work and falsely claims it originated from the creator, or leaves the matter ambiguous and misleading to consumers.

Alternative remedies for creators could be found in the “right of publicity” laws in various states. A state-level right of publicity “is not merely a legal right of the ‘celebrity,’ but is a right inherent to everyone to control the commercial use of identity and persona and recover in court damages and the commercial value of an unpermitted taking.”[103] Such rights are recognized under state common law and statutes, which vary considerably in scope across jurisdictions—frequently as part of other privacy statutes.[104] For example, some states only protect an individual’s name, likeness, or voice, while others also cover distinctive appearances, gestures, and mannerisms.[105] The protections afforded for right-of-publicity claims vary significantly based on the state where the unauthorized use occurs or the individual is domiciled.[106] This creates challenges for the application of uniform nationwide protection of creators’ interests in the various aspects that such laws protect.

In recent hearings before the U.S. Senate Judiciary Subcommittee on Intellectual Property, several witnesses advocated creating a federal version of the right of publicity.[107] The Copyright Office has also previously opined that it may be desirable for Congress to enact some form of a “right of publicity” law.[108] If Congress chose to enact a federal “right of privacy” statute, several key issues would need to be addressed regarding the scope of protection, effect on state laws, constitutional authority, and First Amendment limitations.

Congress would have to delineate the contours of the federal right of publicity, including the aspects of identity covered and the types of uses prohibited. A broad right of privacy could protect names, images, likenesses, voices, gestures, distinctive appearances, and biographical information from any unauthorized commercial use. Or Congress could take a narrower approach focused only on particular identity attributes, like name and likeness. Congress would also need to determine whether a federal right-of-publicity statute preempts state right-of-publicity laws or sets a floor that would allow state protections to exceed the federal standards.

4.              Bargaining for the use of likenesses

A federal right of publicity could present an interesting way out of the current dispute between rightsholders and AI producers. Most of the foregoing comment attempts to pull apart different pieces of potential infringement actions, but such actions are only necessary, obviously, if a mutually beneficial agreement cannot be struck between creators and AI producers. The main issue at hand is that, given the vast amount of content necessary to train an AI system, it could be financially impractical for even the largest AI firms to license all the necessary content. Even if the comments above are correct, and fair use is not available, it could very well be the case that AI producers will not license very much content, possibly relying on public-domain material, and choosing to license only a very small selection.

Something like a “right of publicity,” or an equivalent agreement between creators and AI producers, could provide alternative licensing and monetization strategies that encourage cooperation between the parties. If creators had the opportunity to opt into the use of their likeness (or the relevant equivalent for the sort of AI system in question), the creators could generate revenue when the AI system actually uses the results of processing their content. Thus, the producers would not need to license content that contributes an unknown and possibly de minimis value to their systems, and would only need to pay for individual instances of use.

Indeed, in this respect, we are already beginning to see some experimentation with business models. The licensing of celebrity likenesses for Meta’s new AI chatbots highlights an emerging opportunity for creators to monetize their brand through contractual agreements that grant usage rights to tech companies that commercialize conversational AI.[109] As this technology matures, there will be more opportunities for collaborations between AI producers—who are eager to leverage reputable and recognizable personalities—and celebrities or influencers seeking new income streams.

As noted, much of the opportunity for creators and AI producers to reach these agreements will depend on how rights are assigned.[110] It may be the case that a “right of publicity” is not necessary to make this sort of bargaining happen, as creators could—at least theoretically—pursue litigation on a state-by-state basis. This disparate-litigation strategy could deter many creators, however, and it could also be the case that a single federal standard outlining a minimal property right in “publicity” could help to facilitate bargaining.

Conclusion

The advent of generative AI systems presents complex new public-policy challenges centered on the intersection of technology and copyright law. As the Copyright Office’s inquiry recognizes, there are open questions around the legal status of AI-training data, the attribution of AI outputs, and infringement liability, which all require thoughtful analysis.

Ultimately, maintaining incentives for human creativity, while also allowing AI systems to flourish, will require compromise and cooperation between stakeholders. Rather than an outright ban on the unauthorized use of copyrighted works for training data, a licensing market that enables access to a large corpora could emerge. Rightsholders may need to accept changes to how they typically license content. In exchange, AI producers will have to consider how they can share the benefit of their use of protected works with creators.

Copyright law retains flexibility to adapt to new technologies, as past reforms reacting to photography, sound recordings, software, and the internet all demonstrate. With careful balancing of interests, appropriate limitations, and respect for constitutional bounds, copyright can continue to promote the progress of science and the useful arts even in the age of artificial intelligence. This inquiry marks a constructive starting point, although ongoing reassessment will likely be needed as generative AI capabilities continue to advance rapidly.

[1] Artificial Intelligence and Copyright, Notice of Inquiry and Request for Comments, U.S. Copyright Office, Library of Congress (Aug. 30, 2023) [hereinafter “NOI”].

[2] Tim Sweeney (@TimSweeneyEpic), Twitter (Jan. 15, 2023, 3:35 AM), https://twitter.com/timsweeneyepic/status/1614541807064608768?s=46&t=0MH_nl5w4PJJl46J2ZT0Dw.

[3] Pulitzer Prize Winner and Other Authors Accuse OpenAI of Misusing Their Writing, Competition Policy International (Sep. 11, 2023), https://www.pymnts.com/cpi_posts/pulitzer-prize-winner-and-other-authors-accuse-openai-of-misusing-their-writing; Getty Images Statement, Getty Images (Jan. 17, 2023), https://newsroom.gettyimages.com/en/getty-images/getty-images-statement.

[4] See, e.g., Anton Oleinik, What Are Neural Networks Not Good At? On Artificial Creativity, 6 Big Data & Society (2019), available at https://journals.sagepub.com/doi/full/10.1177/2053951719839433#bibr75-2053951719839433.

[5] William M. Landes & Richard A. Posner, An Economic Analysis of Copyright Law, 18 J. Legal Stud. 325 (1989).

[6] Id. at 332.

[7] Id. at 326.

[8] Id.

[9] See infra, notes 102-103 and accompanying text.

[10] See generally R.H. Coase, The Problem of Social Cost, 3 J. L. & Econ. 1, 2 (1960).

[11] Richard Posner, Economic Analysis of Law (Aspen 5th ed 1998) 65, 79.

[12] Coase, supra note 9, at 27.

[13] Id.

[14] Id. at 27.

[15] Id. at 42-43.

[16] U.S. Copyright Office, Library of Congress, supra note 1, at 14.

[17] For more detailed discussion of GANs and Stable Diffusion see Ian Spektor, From DALL E to Stable Diffusion: How Do Text-to-image Generation Models Work?, Tryo Labs Blog (Aug. 31, 2022), https://tryolabs.com/blog/2022/08/31/from-dalle-to-stable-diffusion.

[18] Id.

[19] Id.

[20] Id.

[21] Id.

[22] Id.

[23] Jay Alammar, The Illustrated Stable Diffusion, Blog (Oct. 4, 2022), https://jalammar.github.io/illustrated-stable-diffusion.

[24] Indeed, there is evidence that some models may be trained in a way that they “memorize” their training set, to at least some extent. See, e.g., Kent K. Chang, Mackenzie Cramer, Sandeep Soni, & David Bamman, Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4, arXiv Preprint (Oct. 20, 2023), https://arxiv.org/abs/2305.00118; OpenAI LP, Comment Regarding Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation, Before the USPTO, Dep’t of Com. (2019), available at https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf.

[25] OpenAI, LP, Comment Regarding Request for Comments on Intellectual Property Protection for Artificial Intelligence, id. (emphasis added).

[26] 17 U.S.C. § 107.

[27] See, e.g., Blake Brittain, Meta Tells Court AI Software Does Not Violate Author Copyrights, Reuters (Sep. 19, 2023), https://www.reuters.com/legal/litigation/meta-tells-court-ai-software-does-not-violate-author-copyrights-2023-09-19; Avram Piltch, Google Wants AI Scraping to be ‘Fair Use.’ Will That Fly in Court?, Tom’s Hardware (Aug. 11, 2023), https://www.tomshardware.com/news/google-ai-scraping-as-fair-use.

[28] 17 U.S.C. § 106.

[29] Register of Copyrights, DMCA Section 104 Report (U.S. Copyright Office, Aug. 2001), at 108-22, available at https://www.copyright.gov/reports/studies/dmca/sec-104-report-vol-1.pdf.

[30] Id. at 122-23.

[31] Id. at 112 (emphasis added).

[32] Id. at 129–30.

[33] 17 U.S.C. § 107.

[34] Id.; see also Campbell v. Acuff-Rose Music Inc., 510 U.S. 569 (1994).

[35] Critically, a fair use analysis is a multi-factor test, and even within the first factor, it’s not a mandatory requirement that a use be “transformative.” It is entirely possible that a court balancing all of the factors could indeed find that training AI systems is fair use, even if it does not hold that such uses are “transformative.”

[36] Campbell, supra note 22, at 591.

[37] Authors Guild v. Google, Inc., 804 F.3d 202, 214 (2d Cir. 2015).

[38] OpenAI submission, supra note 13, at 5.

[39] Id. at 915.

[40] Id.

[41] Id.

[42] Id. at 933-34.

[43] Id. at 923. (emphasis added)

[44] Id.

[45] Id. at 924.

[46] Kelly v. Arriba Soft Corp., 336 F.3d 811 (9th Cir. 2002).

[47] Id.

[48] Id. at 818.

[49] Id.

[50] Id. at 819 (“Arriba’s use of the images serves a different function than Kelly’s use—improving access to information on the internet versus artistic expression.”).

[51] The “public benefit” aspect of copyright law is reflected in the fair-use provision, 17 U.S.C. § 107. In Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 579 (1994), the Supreme Court highlighted the “social benefit” that a use may provide, depending on the first of the statute’s four fair-use factors, the “the purpose and character of the use.”

[52] Supra note 46, at 820.

[53] Perfect 10 Inc. v. Amazon.com Inc., 487 F.3d 701 (9th Cir., 2007)

[54] Id. at 721 (“Although an image may have been created originally to serve an entertainment, aesthetic, or informative function, a search engine transforms the image into a pointer directing a user to a source of information.”).

[55] Id. at 721.

[56] Id. at 723 (emphasis added).

[57] Id. (emphasis added).

[58] Id.

[59] Supra note 37, at 218.

[60] Id. at 215-16.

[61] Id. at 214. See also id. (“The more the appropriator is using the copied material for new, transformative purposes, the more it serves copyright’s goal of enriching public knowledge and the less likely it is that the appropriation will serve as a substitute for the original or its plausible derivatives, shrinking the protected market opportunities of the copyrighted work.”).

[62] Id. at 218.

[63] Perfect 10, 487 F.3d at 721-22 (citing Kelly, 336 F.3d at 818-19). See also Campbell, 510 U.S. at 579 (“The central purpose of this investigation is to see, in Justice Story’s words, whether the new work merely ‘supersede[s] the objects’ of the original creation, or instead adds something new, with a further purpose or different character….”) (citations omitted).

[64] See supra, notes 9-14 and accompanying text.

[65] See, e.g., the development of the compulsory “mechanical royalty,” now embodied in 17 U.S.C. § 115, that was adopted in the early 20th century as a way to make it possible for the manufacturers of player pianos to distribute sheet music playable by their instruments.

[66] See supra notes 9-14 and accompanying text.

[67] U.S. Copyright Office, Library of Congress, supra note 1, at 15.

[68] See infra, notes at 102-103 and accompanying text.

[69] U.S. Copyright Office, Library of Congress, supra note 1, at 15.

[70] See, e.g., Copyright Free Music, Premium Beat By Shutterstock, https://www.premiumbeat.com/royalty-free/licensed-music; Royalty-free stock footage at your fingertips, Adobe Stock, https://stock.adobe.com/video.

[71] U.S. Copyright Office, Library of Congress, supra note 1, at 19.

[72] Id.

[73] U.S. Const. art. I, § 8, cl. 8.

[74] See Ajay Agrawal, Joshua S. Gans, & Avi Goldfarb, Exploring the Impact of Artificial Intelligence: Prediction Versus Judgment, 47 Info. Econ. & Pol’y 1, 1 (2019) (“We term this process of understanding payoffs, ‘judgment’. At the moment, it is uniquely human as no machine can form those payoffs.”).

[75] Joshua Gans, Can AI works get copyright protection? (Redux), Joshua Gans’ Newsletter (Sept. 7, 2023), https://joshuagans.substack.com/p/can-ai-works-get-copyright-protection.

[76] See Nicholas Carlini, et al., Extracting Training Data from Diffusion Models, Cornell Univ. (Jan. 30, 2023), available at https://arxiv.org/abs/2301.13188.

[77] See Chang, Cramer, Soni, & Bamman, supra note 24; see also Matthew Sag, Copyright Safety for Generative AI, Working Paper (May 4, 2023), available at https://ssrn.com/abstract=4438593.; Andrés Guadamuz, A Scanner Darkly: Copyright Liability and Exceptions in Artificial Intelligence Inputs and Outputs, 25-27 (Mar. 1, 2023), available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4371204.

[78] Laureyssens v. Idea Grp. Inc., 964 F.2d 131, 140 (2d Cir. 1992), as amended (June 24, 1992).

[79] Id. at 139.

[80] Harney v. Sony Pictures Television Inc., 704 F.3d 173, 178 (1st Cir. 2013). This assumes, for argument’s sake, that a given model is not “memorizing,” as noted above.

[81] Id. at 178-79.

[82] Id.

[83] U.S. Copyright Office, Library of Congress, supra note 1, at 25.

[84] Notably, the state of mind of the user would be irrelevant from the point of view of whether an infringement occurs. All that is required is that a plaintiff owns a valid copyright, and that the defendant infringed it. 17 U.S.C. 106. There are cases where the state of mind of the defendant will matter, however. For one, willful or recklessly indifferent infringement by a plaintiff will open the door for higher statutory damages. See, e.g., Island Software & Computer Serv., Inc. v. Microsoft Corp., 413 F.3d 257, 263 (2d Cir. 2005). For another, a case of criminal copyright infringement will require that a defendant have acted “willfully.” 17 U.S.C. § 506(a)(1) (2023), 18 U.S.C. § 2319 (2023).

[85] Id. at 550.

[86] Legally speaking, it would be incoherent to suggest that there can be secondary liability without primary liability. The way that AI systems work, however, could prompt Congress to modify the law in order to account for the identified situation.

[87] See, e.g., Metro-Goldwyn-Mayer Studios Inc. v. Grokster Ltd., 380 F.3d 1154, 1160 (9th Cir. 2004), vacated and remanded, 545 U.S. 913, 125 S. Ct. 2764, 162 L. Ed. 2d 781 (2005).

[88] See BMG Rts. Mgmt. (US) LLC v. Cox Commc’ns Inc., 881 F.3d 293, 306 (4th Cir. 2018); Sony Corp. of Am. v. Universal City Studios Inc., 464 U.S. 417, 442 (1984).

[89] A&M Recs. Inc. v. Napster Inc., 239 F.3d 1004, 1022 (9th Cir. 2001), as amended (Apr. 3, 2001), aff’d sub nom. A&M Recs. Inc. v. Napster Inc., 284 F.3d 1091 (9th Cir. 2002), and aff’d sub nom. A&M Recs. Inc. v. Napster Inc., 284 F.3d 1091 (9th Cir. 2002).

[90] See, e.g., Content Filtering, Microsoft Ignite, available at https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter (last visited Oct. 27, 2023).

[91] Note that, if an AI producer can demonstrate that they used no protected works in the training phase, there may in fact be no liability for infringement at all. If a protected work is never made available to the AI system, even an output very similar to that protected work might not be “substantially similar” in a legal sense.

[92] Copyright Review Board, Second Request for Reconsideration for Refusal to Register Théâtre D’opéra Spatial (SR # 1-11743923581; Correspondence ID: 1-5T5320R), U.S. Copyright Office (Sep. 5, 2023), available at https://fingfx.thomsonreuters.com/gfx/legaldocs/byprrqkqxpe/AI%20COPYRIGHT%20REGISTRATION%20decision.pdf.

[93] Am. Broad. Companies Inc. v. Aereo Inc., 573 U.S. 431, 453 (2014). (Thomas J, dissenting).

[94] Id. at 451.

[95] U.S. Copyright Office, Library of Congress, supra note 1, at 21.

[96] See 5 U.S.C. § 1051 et seq. at § 1127.

[97] See, e.g., ETW Corp. v. Jireh Pub. Inc., 332 F.3d 915, 923 (6th Cir. 2003).

[98] Dastar Corp. v. Twentieth Century Fox Film Corp., 539 U.S. 23, 34 (2003).

[99] Id. at 37.

[100] Id. at 32.

[101] See, e.g., Multi Time Mach. Inc. v. Amazon.com Inc., 804 F.3d 930, 938 (9th Cir. 2015); EarthCam Inc. v. OxBlue Corp., 49 F. Supp. 3d 1210, 1241 (N.D. Ga. 2014); Coll. Network Inc. v. Moore Educ. Publishers Inc., 378 F. App’x 403, 414 (5th Cir. 2010).

[102] Digby Adler Grp. LLC v. Image Rent a Car Inc., 79 F. Supp. 3d 1095, 1102 (N.D. Cal. 2015).

[103] J. Thomas McCarthy, The Rights of Publicity and Privacy § 1:3. Introduction—Definition and History of the Right of Publicity—Simple Definition of the Right of Publicity, 1 Rights of Publicity and Privacy § 1:3 (2d ed).

[104] See id. at § 6:3.

[105] Compare Ind. Code § 32-36-1-7 (covering name, voice, signature, photograph, image, likeness, distinctive appearance, gesture, or mannerism), with Ky. Rev. Stat. Ann. § 391.170 (limited to name and likeness for “public figures”).

[106] See Restatement (Third) of Unfair Competition § 46 (1995).

[107] See, e.g., Jeff Harleston, Artificial Intelligence and Intellectual Property – Part II: Copyright, U.S. Senate Comm. on the Judiciary Subcomm. on Intellectual Property (Jul.12, 2023), available at https://www.judiciary.senate.gov/imo/media/doc/2023-07-12_pm_-_testimony_-_harleston1.pdf; Karla Ortiz, “AI and Copyright”, U.S. Senate Comm. on the Judiciary Subcomm. on Intellectual Property (Jul. 7, 2023), available at https://www.judiciary.senate.gov/imo/media/doc/2023-07-12_pm_-_testimony_-_ortiz.pdf; Matthew Sag, “Artificial Intelligence and Intellectual Property – Part II: Copyright and Artificial Intelligence”, U.S. Senate Comm. on the Judiciary Subcomm. on Intellectual Property (Jul. 12, 2023), available at https://www.judiciary.senate.gov/imo/media/doc/2023-07-12_pm_-_testimony_-_sag.pdf.

[108] Authors, Attribution, and Integrity: Examining Moral Rights in the United States, U.S. Copyright Office (Apr. 2019) at 117-119, https://www.copyright.gov/policy/moralrights/full-report.pdf.

[109] Benj Edwards, Meta Launches Consumer AI Chatbots with Celebrity Avatars in its Social Apps, ArsTechnica (Sep. 28, 2023), https://arstechnica.com/information-technology/2023/09/meta-launches-consumer-ai-chatbots-with-celebrity-avatars-in-its-social-apps; Max Chafkin, Meta’s New AI Buddies Aren’t Great Conversationalists, Bloomberg (Oct. 17, 2023), https://www.bloomberg.com/news/newsletters/2023-10-17/meta-s-celebrity-ai-chatbots-on-facebook-instagram-are-surreal.

[110] See supra, notes 8-14 and accompanying text.

Continue reading
Intellectual Property & Licensing

Decoding the AI Act: A Critical Guide for Competition Experts

Scholarship Abstract The AI Act is poised to become a pillar of modern competition law. The present article seeks to provide competition practitioners with a practical . . .

Abstract

The AI Act is poised to become a pillar of modern competition law. The present article seeks to provide competition practitioners with a practical yet critical guide to its key provisions. It concludes with suggestions for making the AI Act more competition friendly.

Read at SSRN.

Continue reading
Innovation & the New Economy