Databricks, a company that helps technological giants to create custom AI models, has developed an automatic learning trick that can increase the performance of a generative model without the need for data labeled or gross. For a year, Jonathan Frankle, an IA chief scientist in Databricks, spoke with customers about the main challenges to get artificial intelligence systems to work reliably. According to him, the problem is the “dirty data.”
“Everyone has some data and a minimal idea of what they want to do,” Frankle reiterates. He adds that no one offers clear and precise data that can be entered into a Prompt (instruction) or in an application programming interface for a model. The Databricks model could allow companies to display AI agents for specific tasks without data quality being an obstacle.
A mixture to play better data
The technique offers an unusual vision of the key tricks that engineers use to improve the capacities of advanced artificial intelligence models, especially when it is difficult to get good data. The method combines the ideas produced by advanced reasoning models with reinforcement learninga way to improve practice with “synthetic” training data or generated by AI.
The latest OpenAi models, Google and Deepseek are largely based on reinforcement learning and synthetic training data. Wired revealed that Nvidia plans to acquire Gretel, a company specialized in synthetic data. “We are all browsing this space,” says Frankle.
The Databricks method takes advantage of the fact that, with enough attempts, even a weak model can obtain a good score in a certain task or reference. Researchers call this method of improving the performance of a “Best-OF-N” model (better of N). Databricks trained a model to predict which result would prefer human evaluators. The Databricks (DBRM) reward model can be used to improve the performance of other models without the need for more data labeled.
The DBRM selects the best results of a specific model. Thus, synthetic training data is created that allow to refine the model to produce better results to the first. Databricks calls its new method “Test-Time Adaptive Optimization” (Tao), or adaptive optimization in trial time. “This method uses a relatively light reinforcement learning to incorporate the advantages of adaptive optimization in the model itself,” explains Frankle.
The research carried out by Databricks shows that the Tao method improves as it is extended to larger and more capable models. Reinforcement learning and synthetic data are already widely used, but combining them to improve large language models (LLM) is a relatively new and technically difficult technique.
Databricks is unusually open about how AI develops, because it wants to demonstrate to customers that it has the necessary skills to create powerful custom models for them. In the past, the company revealed Wired how DBX developed, an advanced open source language model (LLM) from scratch.
Tao’s future is brilliant
Without well -labeled and carefully preserved data, it is difficult to adjust a LLM to perform specific tasks with greater effectiveness, such as analyzing financial reports or medical records to find patterns or identify problems. Many companies now expect to use the LLMs to automate tasks with the so -called AI agents.
For example, an agent used in finance could analyze the main results of a company, generate a report and automatically send it to different analysts. Another context in which an agent would be useful is in the health insurance industry, guiding customers with information about a drug or relevant disease.
#company #key #train