Forever free, unsponsored, and proudly primate produced! If you’ve enjoyed reading, please consider buying me a coffee.

coffee cup

Here in Australia, the creeping Tech Industry spectre is trying to subvert copyright law so they can freely use copyrighted material to train AI models. It doesn’t just affect me, it affects you too.

The Productivity Commission undertakes independent research on economic, social and environmental issues affecting Australians, and has asked for feedback on reforming the Copyright Act 1968 (Cth) to include a fair dealing exception for text and data mining.

Which would essentially give them all of it for free.

I’m fairly sure you can imagine I have opinions about that.

And now that I’ve submitted my feedback, I’m including it here as well.


Iโ€™m not a lawyer, nor do I play one on tv, but I do run a business based on the use of copyrighted work.

To suggest reforming the Copyright Act 1968 (Cth) to allow the use of copyrighted materials to train AI models is to fundamentally misunderstand the legislation.

As soon as an Australian citizen or resident commits an idea to a form, the Act protects the original work and its adaptations for the creatorโ€™s life plus 70 years (after death). It guarantees the copyright owner exclusive rights to the work.

This exclusivity allows copyright owners to decide when, how, and who can use the work, for example, in election campaigns, advertising, or in support (or not) of white supremacist movements.

In practice, the number and type of copyrights are limited only by the rightsholderโ€™s imagination, so anyone who wants access to copyrighted material can negotiate a licensing agreement. This regularly occurs within the publishing, entertainment, media, manufacturing, franchising, and, ironically, tech industries.

Licenses specify the exact use, in which territories, for how long, termination, dispute resolution, and of course, compensation.

An agreement to license copyrighted materials for training AI models could run along the lines of single use, for $50. A โ€œsimpleโ€ agreement like this could take as little as a few days to negotiate. A more complex agreement, for example, gaming or movies with many clauses, often takes years.

If a developer cannot or does not want to agree on terms with the rightsholder, they have several options:

  • Hire people to write under the companyโ€™s copyright. This regularly occurs across almost all industries (e.g., marketing, technical, and policy documents).
  • Use public domain works.
  • Or if they canโ€™t locate a specific rightsholder, petition the courts for access.

The value of copyright

The value of copyright comprises the work and the exclusive right to control it.

However, unlike Minties (patented 1926), copyrighted works do not arrive in a nice, neat, standardised form, and do not have the same value.

In general, a published work is worth more than an unpublished one. An experienced writer more than inexperienced. As a series, more than stand alone. Adapted (e.g., into a movie or game) more than unadapted. More with related merchandise than without.

As an example, the literary estate of Edgar Rice Burroughs (1875-1950) contains over 80 books, including Tarzan, John Carter, Pellidicar, Venus, and Caspak; the current combined value of the work is conservatively estimated at US$5m.

ERB incorporated in 1923 to gain more control over his work. Seventy-five years after his death, the company actively manages the original works and over 100 trademarks including the Tarzan yell, hiring authors to create derivatives, licensing new editions, games, tv shows, movies, toys, and clothes, to name a few.

As a private company, the value of Edgar Rice Burroughs Incโ€™s is unknown, but for comparison, The Walt Disney Company (DIS) is worth about US$209.12b.

With US copyright law behind them, ERB and DIS have the resources to rigorously pursue infringement.

With this in mind, Australia is in the lucky position of having the capacity to take a world-leading approach to protecting its intellectual property. Just as it took the lead on plain cigarette packaging.

The value of reputation

Edgar Rice Burroughs wrote adventure stories, primarily in the fantasy and science fiction genres. His reputation for this style of story keeps readers buying his books today. And because readers are still buying, licensees are still clamouring for agreements.

Likewise, readers of Markus Suzak appreciate his historical young adult fiction. Readers of Liane Moriarty, her domestic psychological thrillers. And while Helen Garner doesnโ€™t really stick to genres, you know her books are going to be about complex emotions and social issues.

Bestselling author David Baldacci has a reputation for fast-paced historical and legal thrillers. He testified to the US Senate Judiciary Committee hearing on โ€œAI Industryโ€™s Mass Ingestion of Copyrighted works,โ€ that he worked hard over decades honing his craft, resulting in over 60 novels.

When his son asked ChatGPT to write a plot like a David Baldacci novel, within five seconds, it had produced three pages containing the plots, twists, names and narratives his reputation is built on. He said, โ€œI truly felt like someoneโ€™d backed up a truck to my imagination and stolen everything Iโ€™d ever created.โ€

Now, anyone can create a book that reads like a David Baldacci novel, because, as he said, โ€œit is my novel. It is my imagination.โ€ And because his name is attached, I imagine anyone who likes his work will likely buy them.

Journalists and scientists have reputations as well.

Award-winning English journalist Carole Cadwalladr revealed the Facebook-Cambridge Analytica scandal in 2018. In her TED talk, โ€œThis Is What a Digital Coup Looks Like,โ€ she revealed her work was stolen and used to train ChatGPT. When she asked it to write a TED talk in her style, the result was โ€œcreepily plausible,โ€ aside from being factually incorrect.

Such an article, published in an unauthorised publication under her byline, could cause irreversible reputational harm, yet with an exception for text and data mining, there is no opt out for her, or others facing reputational harm.

Fair dealing exceptions exist within the Act

Legislative criteria to clarify fair dealing are not required as they exist within the Act.

Bearing in mind the Act exists to incentivise creators, and a copyrighted work cannot be monetised without using, selling or licensing, itโ€™s clear the rightsholder controls the work.

And as the Act considers โ€œfairโ€ from the rightsholderโ€™s point of view, it defaults to viewing commercial use of copyrighted work without permission as infringement.

Additionally, the Act makes liberal use of the phrase โ€œunless the contrary intention appears,โ€ and these together suggest commercial use does not warrant a fair dealing exception.

However, there are some personal, not-for-profit exceptions, the main ones being study, criticism, parody, and the news.

As mentioned earlier, copyright value is variable, so exceptions outside the four mentioned in the Act are determined on a case-by-case basis. On the implicit understanding a rightsholder can pursue uses they do not consider fair.

The legislative criteria are;

  • nature and amount of the work used,
  • intended purpose of the result,
  • effect on the market, and
  • whether the work is available at a commercial price within a reasonable time.

In the instance of study, the act suggests 10% or a chapter is a reasonable amount of a copyrighted work for fair dealing.

Itโ€™s hard to imagine an Australian court would agree 100% of a commercially available work, used by a commercial interest, to make a for-profit product would constitute fair dealing.

What is text and data mining?

Discussions about AI usually talk as though itโ€™s like an earthworm; text and data flows through it and comes out the other side somehow enriched.

The truth is, we donโ€™t know what, or even if, the text and data come out the other side, because what developers do with copyrighted material is protected commercial-in-confidence.

But what we know is developers need โ€œtext and data.โ€

From recent US events, we know they need more than random words. They need words strung together in ways that give them meaning and context, as well as veracity and reliability.

Or to put it another way, the critical need is the expression the Act protects.

Form of expression is important.

For example, the terms โ€œtextโ€ and โ€œdata,โ€ like an AI โ€œhallucination,โ€ imply theyโ€™re trivial issues, but:

  • Text relates to original written words (i.e., copyrighted), as opposed to a paraphrase.
  • Data is a synonym for facts, which are processed into information, and then interpreted and contextualised into knowledge (also copyrighted).
  • Hallucinations are compelling senses of reality, and as AI is a machine, it does not have a sense of reality. AI hallucinations are, in fact, program errors. Some more dangerous than amusing.

Happily, for my argument, mining not only refers to the process of extracting valuable materials but also the process of laying explosivesโ€ฆ

Impact on AI development of not permitting text and data mining

Regardless of whether an amendment makes it through Parliament, developers will continue to develop AI because that is what they do.

If it doesnโ€™t, developers will have to negotiate licenses with rightsholders, so development will be slower and more expensive.

There are four primary reasons AI developers seek an exception for text and data mining:

  1. They need a lot of copyrighted material,
  2. it can be difficult to locate rightsholders,
  3. it takes time to negotiate access, and
  4. it costs money.

These reasons are not insurmountable. As mentioned earlier, licensing regularly occurs within the publishing, entertainment, media, manufacturing, franchising, and tech industries.

We all know the value of free is nothing, so negotiating and paying for copyrighted work will probably require more careful consideration of which material best suits the need.

Developers apparently donโ€™t understand that many end users are concerned about the ethics and sustainability of the products they purchase. To have negotiated access to critical source material is a positive, potentially resulting in a high-value product. Like using solar energy and water-saving data centres.

In the US, this is considered a cost, but negotiating licences offers a multitude of value-adding unique selling propositions.

For example, restricting the input to Australian literature could result in a virtual assistant or customer support chatbot with a distinctly Australian sense of humour and language instead of generic American. Mystery thrillers might be useful for security, romance, or womenโ€™s fiction in care, and erotic fictionโ€ฆ ahem, would suit other more intimate uses.

Additionally, the Act requires creator attribution (regardless of who owns the rights, and no matter the use), which offers the opportunity to market the product as using particular authors. For example, generated using Tim Winton, Kate Grenville or Richard Flanagan.

Text and data mining impact on rightsholders

While AI developers will manage, an exception for text and data mining could affect every single Australian. Not just flighty book authors, freaky conspiracy theorists with websites, and social media influencers, but anyone whoโ€™s ever made a home movie, snapped a photo or written an email or letter.

Granting a copyright exception for text and data mining gives developers access to all the text and data. Not just the works that exist now, but all that come into being in the future. All the text and data that will ever exist. Free. Forever. No matter the format, source, or location.

Not just the work published in easily accessible digital or physical formats, but unpublished works, such as research documents, working papers, and manuals. Including private and confidential records, like legal agreements, client records, or proprietary formulas and processes.

An AI exception permits no right of termination. Offers no dispute resolution. No compensation. No need to ask permission. No legal responsibility.

Plus, there would be no requirement to prove youโ€™re a developer. Or that youโ€™re using the work to train an AI model. Or that youโ€™re even Australian.

And once the alleged AI developer has the work, they wonโ€™t be required to de-identify it, store it securely, and certainly wonโ€™t have a duty of care toward it.

They wonโ€™t have to stop employees excerpting pieces and circulating them widely as memes of people with their literal trousers down.

And when they run out of easily accessible work, it wonโ€™t be possible to refuse them access to the rest without a battery of lawyers on standby.

With flow-on effects to the Privacy Act 1988 (Cth) and the Cybercrime Act 2001 (Cth).

Impact on the value of copyright

As mentioned earlier, part of the value of copyrights are the exclusive rights to sell or license the rights to others.

With a mining exception, all copyrighted works are effectively in the public domain, meaning anyone can use them. The rightsholder will not have sufficient leverage to negotiate reasonable compensation for further licenses.

Such an exception could potentially destroy livelihoods.

To be blunt, the situation of a creator denied control of their rights, coerced into supporting the development of for-profit AI models with no compensation is akin to slavery. To do so through legislative change is the first step towards fascism.

Render of an AI brain in circuitry
Render of a brain on circuit board by Rubidium Beach on Unsplash

Categories:

โ˜… , , โ˜…

Tags:


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.