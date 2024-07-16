The use of YouTube videos to train artificial intelligence (AI) models is becoming an increasingly controversial topic. After criticism of OpenAI, Meta, and Google, now Apple has also found itself in the eye of the storm.

According to a recent Wired report, third parties downloaded the videos as subtitle files, which were then used to train LLM models. The investigation reveals that over 170,000 videos were used, including content from popular YouTubers such as MKBHD, Jimmy Kimmel, PewDiePie, and MrBeast.

This raises important questions about consent and ethical practices in using AI. The companies involved used the content for their training processes, even though this material extraction practice violates YouTube’s rules prohibiting automated access to videos and independent use of them without permission.

A nonprofit called EleutherAI originally collected this data, called Pile, with the intention of using it for educational and developer training purposes. However, EleutherAI has also been embroiled in controversy for using the dataset without the consent of the video creators.

Apple has reportedly used Pile to train its OpenELM model, which was launched in April. This highlights the importance of addressing issues of consent and ethical practices in AI. The unauthorized use of copyrighted content to train AI models raises questions about the liability of the companies involved and the impact these practices may have on content creators.

“A Proof News investigation – reads the Wired article – has found that some of the world’s richest AI companies have used material from thousands of YouTube videos to train AI. The companies did so despite YouTube’s rules prohibiting the extraction of material from the platform without permission. Our investigation found that subtitles from 173,536 YouTube videos, taken from more than 48,000 channels, were used by Silicon Valley giants, including Anthropic, Nvidia, Apple and Salesforce”.

At this time, Apple and the other companies involved have not yet issued an official statement regarding the concerns raised. It remains to be seen how the tech giants will respond to these accusations and what measures they will take to ensure ethical and responsible use of data in the field of artificial intelligence. And speaking of AI problems, Microsoft is in new trouble with the UK antitrust authority: in the crosshairs are hiring for the Microsoft AI division.

What do you think about this situation? Do you think tech companies should get consent from content creators before using their videos to train AI models? Let us know in the comments below.