AI Training Reshaped: From 'Hallucinations' to Expertise, High-Paying Specialists Become the New Darling of Model Training
AI Training Reshaped: From 'Hallucinations' to Expertise, High-Paying Specialists Become the New Darling of Model TrainingThe performance of AI models heavily relies on the quality of their training data. In the early days, models like ChatGPT and its competitor Cohere relied on large, low-cost human teams for basic data annotation, like distinguishing between a car and a carrot in an image
AI Training Reshaped: From 'Hallucinations' to Expertise, High-Paying Specialists Become the New Darling of Model Training
The performance of AI models heavily relies on the quality of their training data. In the early days, models like ChatGPT and its competitor Cohere relied on large, low-cost human teams for basic data annotation, like distinguishing between a car and a carrot in an image. However, with intensifying competition, the demands on model training have evolved into a highly complex professional task, requiring a rapidly expanding network of expert trainers.
Today, from historians to scientists, even those with PhDs, join the ranks of AI model training, providing their deep expertise to ensure the accuracy and depth of training data, significantly elevating the overall performance of models.
Cohere co-founder Ivan Zhang states, "A year ago, we could hire undergraduates to roughly teach AI how to improve. Now, we have practicing physicians teaching the model how to work in healthcare settings, and we have financial analysts and accountants helping the model perform better in specific domains."
To enhance training capabilities, Cohere, valued at over $5 billion, partnered with a startup called InvisibleTech. InvisibleTech employs thousands of remote trainers and has emerged as a key partner in the AI sector, providing training services to multiple AI companies, including AI21 and Microsoft, helping to reduce "hallucinations" in AI models.
Invisible founder Francis Pedraza explains, "We have over 5,000 experts with PhDs, Masters degrees, and deep domain expertise in over 100 countries globally." Depending on the complexity of the task and location, Invisible's hourly rates can reach up to $40. Other companies like Outlier pay up to $50 per hour, while Labelbox offers as high as $200 per hour for "high-expertise" subjects like quantum physics, with basic tasks starting at $15 per hour.
Invisible, founded in 2015, initially focused on automating workflows for companies like DoorDash. However, a turning point occurred in the spring of 2022, before ChatGPT's public release, when OpenAI reached out to collaborate. Pedraza recalls, "OpenAI had a problem. Early versions of ChatGPT were prone to 'hallucinations' when answering questions, with unreliable answers. They needed an advanced training partner that could enable AI learning through human feedback."
Generative AI relies on past data used for training to generate new content. However, sometimes it struggles to distinguish between real and fake, creating so-called "hallucinations." One instance occurred in 2023 when Google's chatbot provided inaccurate information in a promotional video about which telescope first captured an exoplanet. AI companies recognize that "hallucinations" could impact the appeal of generative AI in business, so they're experimenting with various methods to reduce this phenomenon, including utilizing human trainers to teach AI to differentiate between fact and fiction.
Since collaborating with OpenAI, Invisible has rapidly become a preferred training partner for numerous generative AI companies, including Cohere, AI21, and Microsoft. While Microsoft has not officially confirmed the partnership, Cohere and AI21 have acknowledged being major clients of Invisible.
Pedraza points out, "Training costs are the second largest expense for companies in the AI industry, second only to compute. High-quality training is crucial for ensuring model accuracy and reliability."
"Human Data Teams": The New Engine of AI Training
Behind OpenAI's generative AI boom stands a team called "Human Data Teams." They collaborate with AI trainers, collecting specialized data to train models like ChatGPT. Insiders familiar with the company's processes reveal that OpenAI researchers design a series of experiments aimed at addressing issues like reducing "hallucinations," optimizing writing styles, and more. They collaborate with AI trainers from providers like Invisible to collect and process data based on their needs.
These insiders mention that at any given time, there might be dozens of ongoing experimental projects, which may utilize either OpenAI's self-developed tools or solutions provided by vendors. Invisible, based on AI companies' needs, hires experts with relevant academic backgrounds. Whether it's a scholar studying Swedish history or a financial modeling specialist, they contribute to these AI projects, easing the burden on AI companies managing large numbers of trainers.
Pedraza observes, "OpenAI has some of the world's best computer scientists, but they may not be experts on questions in Swedish history, chemistry, or biology." He adds that OpenAI alone has over 1,000 contract workers providing data annotation services.
Coheres Ivan Zhang personally experienced the capabilities of Invisible trainers, successfully teaching Cohere's generative AI model how to extract relevant information from large datasets.
Intensified Competition: The Emergence of a Specialist Training Market
In the AI training dataset space, ScaleAI is a major competitor to Invisible. This privately held startup, valued at $14 billion, not only provides dataset services but also began offering training services to AI companies, listing OpenAI as one of its clients. ScaleAI has not responded to requests for comment.
In contrast, Invisible's fundraising has been more conservative, becoming profitable since 2021, having secured only $8 million in major capital. Pedraza states, "We have 70% ownership by the team and 30% by investors." He also reveals that the company's recent valuation reached $500 million.
Early entrants into the AI training domain primarily engaged in data annotation work, which required relatively low skills, resulting in lower pay, sometimes as low as $2 per hour, primarily carried out by workers in Africa and Asia.
However, with the rapid development of AI technology, the demand for specialized trainers has surged, encompassing dozens of languages and fields, creating a high-paying niche market. Today, experts from various disciplines, with no programming skills, have opportunities to become AI trainers. The demand from AI companies is giving rise to more businesses offering similar services.
Ivan Zhang mentions, "My inbox is flooded with new companies popping up all the time, entering the AI training service market. It's truly a new field where companies are hiring humans just to generate data for us AI labs."
Tag: Training AI Reshaped From Hallucinations to Expertise High-Paying Specialists
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.