MosaicML, an artificial intelligence (AI) startup based in San Francisco, announced today the release of its groundbreaking language model, MPT-30B. The new model, trained at a fraction of the cost of its competitors, promises to revolutionize the field of artificial intelligence in enterprise applications.
Naveen Rao, the CEO and cofounder of MosaicML, said in an interview with VentureBeat that MPT-30B was trained at a cost of $700,000, far less than the tens of millions of dollars required to train GPT-3. The lower cost and smaller size of MPT-30B could make it more attractive to enterprises looking to deploy natural language processing (NLP) models in applications like dialog systems, code completion and text summarization.
“MPT-30B adds better capabilities for summarization and putting more data into the prompt and having [the model] reason over that data,” Rao said. “So if that’s a requirement for you, that you care less about the economics of serving, then maybe the 30B is a better fit [than our 7B model].”
Roa said that MosaicML used various techniques to optimize the model, such as Alibi and FlashAttention mechanisms that enable long context lengths and high utilization of GPU compute. He also said that MosaicML was one of the few labs that have access to Nvidia H100 GPUs, which increased the throughput-per-GPU by over 2.4 times and resulted in a faster finish time.
“We want to get as many people on the technology as we can,” Rao said. “That’s our goal. It’s not to be exclusive. It’s not to be elitist. It’s to get more people using this.”
Enabling enterprises to build custom models for cheaper
MosaicML allows businesses to train models on their own data using the company’s model architectures and then deploy the models through its inference API. Rao said that while he couldn’t disclose many customer examples due to confidentiality, startups have used MosaicML’s models and tools to build natural language frontends and search systems.
MosaicML’s release of MPT-30B and its model deployment tools highlight the company’s goal of making advanced AI more accessible, according to Rao. “I think the big issue is really just empowering more people with the technology. And that’s been our goal from the start: being really transparent about costs and time and difficulty.”
The availability of MPT-30B as an open-source model and MosaicML’s model tuning and deployment services position the startup to challenge OpenAI for dominance in the market for large language model (LLM) technologies. With more advanced models and tools slated for release in the coming months according to Rao, the race is on for leadership in the next generation of AI.
The future of AI involves many custom LLMs
The company’s vision for the future of generative AI is to create a tool that can assist experts across various industries, accelerating their work without replacing them. “I think the future, at least for the next five years, is going to be about taking these techniques and making everyone who’s an expert already, even better,” Rao explained.
In addition to making AI technology more accessible, MosaicML is focusing on enhancing data quality for better model performance. It is developing tools to help users layer in domain-specific data during the pre-training process. This ensures a diverse and high-quality mix of data, which is essential for building effective AI models.
With the release of MPT-30B, MosaicML is poised to make significant advancements in the AI industry, offering a more affordable and powerful option for enterprises. Its dedication to open-source technology and empowering more people with AI tools has the potential to unlock a wealth of untapped innovations, making AI a valuable asset for businesses across the globe.
As enterprises continue to adopt and invest in AI technology, MosaicML’s MPT-30B could very well be the catalyst that drives a new era of more accessible and impactful AI solutions in the business world.