Congress told AI firms should pay for copyrighted content

ylai@lemmy.ml · 8 months ago

Congress told AI firms should pay for copyrighted content

BlameThePeacock@lemmy.ca · 8 months ago

You don’t have to pay the rightsholder if your hired human reads various newspapers in order to learn how to write. Or at least no more than a single person’s subscription fee to said content.

So why the hell should you have to pay more to train an AI model on the same content?

It’s faster than a human? So what? Why does that entitle you to more money? There are fast and slow humans already, and we don’t charge them differently for access to copyright material.

The tool that’s being created is used by more than one human/organization? So what? Freelance journalists write for many publications after having learned on your material. You aren’t charging them a license fee for every org they write for.

That being said, this is one of those turning points in the world where it doesn’t matter what the results of these lawsuits are, this technology is going to use copyrighted material whether it’s licensed or not. Companies will just need to adapt to the new reality.

OpenAI and other large companies are the target right now, but the much smaller open source generative AI models are catching up fast, and there’s no way to stop individuals using copyright material to train or personalize their AI, currently it’s processing intensive to train, but it’s already dropped in price by orders of magnitude, and it’s going to keep getting cheaper as computing hardware gets better.

If all you see is the article written by Joe Guy, and it’s a good article with useful information, you can’t prove that Joe even used a tool most of the time, let alone that the tool was trained on a specific piece of copyrighted material, especially if everyone’s training for their AI is a little bit different. Unless it straight up plagiarizes, no court is going to convict Joe. Avoiding direct plagiarism is as easy as having a plagiarism tool double check against the original training material.

ZOSTED@sh.itjust.works · 8 months ago

Why not just use free use content? There’s plenty of it. More than ever before in human history

BlameThePeacock@lemmy.ca · 8 months ago

I’m not actually sure you’re accurate with your statement. Prior to copyright law being introduced, everything was free use.

These days, anything a human produces immediately becomes copyrighted. Every post you make, every podcast you record, every doodle on a napkin, every instagram post, every speech you deliver…

You actually have to intentionally license it for free use, which almost nobody does.

ZOSTED@sh.itjust.works · 8 months ago

There is enough freely licensed content to make whatever you want. I have no trouble at all making websites and comic books and video games using freely licensed content.

BlameThePeacock@lemmy.ca · 8 months ago

You were trained on Copyright materials. You’ve read copyrighted websites, comic books, and video games.

ZOSTED@sh.itjust.works · 8 months ago

Yeah I paid for em too

BlameThePeacock@lemmy.ca · 8 months ago

You paid your own money for every single copyright work you’ve ever seen in your life?

No you did not. Not even close.

You didn’t pay for it at school because a lot of it falls under educational fair dealing rules. You didn’t pay when you borrowed that video game from your friend, or when you read a graphic novel at the library.

And you definitely didn’t pay for every news article you’ve read online.

ZOSTED@sh.itjust.works · 7 months ago

You paid your own money for every single copyright work you’ve ever seen in your life?

I never claimed this distinction, and I don’t think it’s a meaningful point.

I’m saying that I pay for art. These companies don’t, but more to the point, they seek to undermine their source once they’ve extracted all the training data they need. I’d go so far as to say it’s in poor taste to use free art, because it should be patently obvious that most artists putting out free art, did not anticipate its use by devices that let you bypass artists entirely.

There’s an alternate way that this could have all gone down: after some internal testing, we could have simply asked artists to volunteer their work for the project of training. There are enough people excited about the tech that this would have been plenty! It just wouldn’t have let companies rush for market share, and hope the business utility would gloss over any ethical qualms in the aftermath.