Training artificial intelligence (AI) models requires vast amounts of data, and the quality of the results is directly linked to the data fed into the system. However, obtaining this information comes at a cost, as it involves intellectual property rights.
AI companies, however, have a different perspective. They often take for granted the knowledge amassed over generations by writers and have a different interpretation of fair use. They are reluctant to compensate the creators of the content that has shaped their models.
The theft of human knowledge is a pressing issue. It takes a great deal of hard work and dedication from writers, editors, researchers, and publishers to produce the content we see in various mediums, such as newspapers, magazines, books, online archives, and research papers. This valuable knowledge should not be exploited by companies.
OpenAI, for example, has faced criticism for the content it used to train its AI systems. While they do license information from third parties and rely on input from users and human trainers, it remains unclear if they obtained permission from vendors before using copyrighted materials for commercial purposes.
In the past, most of the written content, both online and offline, was created by humans. Despite some low-quality content, it was at least shaped by individuals who understood the human psyche and thinking process. Generative AI applications were built upon this foundation of human knowledge.
However, AI companies now face a new challenge when training their models. Machine-generated content has flooded the internet, which is generally considered of poor quality. This content hinders the availability of resources for training AI models, as it fails to produce the desired output. This phenomenon, often referred to as AI cannibalism or cloning, occurs when AI models feed on other AI-generated content.
To combat this issue, AI firms must limit their source material to credible sources, such as newspapers, magazines, and public forums that host a wealth of human-produced knowledge. Some companies, like Reddit, a popular web-hosted public forum, are even considering licensing their content to AI firms. Reddit has expressed a preference for business agreements, but has not ruled out legal action if negotiations fail. Just as individuals are not allowed to use copyrighted soundtracks in their YouTube videos, AI companies should not be permitted to use copyrighted content for commercial purposes.
Copyright ownership poses a significant challenge, as AI firms frequently violate these rights. Furthermore, AI is not capable of gathering news independently. It relies on human effort to gather and verify information from various sources before it can be utilized by AI models. Failing to compensate human resources in this process represents a form of exploitation.