In the realm of artificial intelligence, ChatGPT has become synonymous with technological leadership and the future envisioned by many C-suite executives. However, this flagship project from OpenAI is just one among numerous large language models available today. In certain software projects or domains, ChatGPT may not even be the optimal choice. As new competitors emerge almost daily, the race to develop the next generation of AI tools intensifies, with promises of either Earth’s liberation or destruction, depending on who you ask.
But are some models superior to others? Perhaps. Every model possesses flaws, quirks, glitches, and weaknesses that become more apparent with prolonged usage. While generative AI appears awe-inspiring at first, its peculiar and unpredictable aspects gradually come to light.
Benchmarking LLMs
Measuring the quality of generative AI responses scientifically is difficult because of the scope of the models and how they’re used. A data scientist could feed in thousands or even millions of test questions and evaluate the responses, but the results will be limited if the test sets focus on only one type of question. Consulting a resource like Hugging Face’s Open LLM Leaderboard is interesting but not necessarily accurate.
If finding a precise way to benchmark LLMs is tough, at least switching between them is getting easier. Some projects like OpenLLM or FastChat make it simpler to wire up various models despite their different APIs and interfaces. You can stitch together the layers and sometimes even run the models in parallel.
A big question in the background is cost. While everyone is enjoying the explosion of interest and investment, building a large language model can take months or even years. Teams first assemble the training data, then they push the data through expensive hardware that sucks down electricity. Finally, they produce the model. The best way to monetize and sustain this work is an evolving question.
Some organizations are experimenting with open sourcing their results, while others happily rely on services with their own billing models. Open source LLMs can be a real gift—but only if you’re able to handle the work of deploying the model and keeping it running.
Here’s a look at 14 large language models that aren’t ChatGPT. They may or may not be just what your project needs. The only way to know is to send them your prompts and carefully evaluate the results.
Llama
Facebook (now Meta) created this foundational LLM and then released it as part of its stated “commitment to open science.” Anyone can download Llama and use it as a foundation for creating more finely-tuned models for particular applications. (Alpaca and Vicuna were both built on top of Llama.) The model is also available in four different sizes. The smaller versions, with only 7 billion parameters, are already being used in unlikely places. One developer even claims to have Llama running on a Raspberry Pi, with just 4GB of RAM.
Alpaca
Several Stanford researchers took Meta’s Llama 7B and trained it on a set of prompts that mimic the instruction-following models like ChatGPT. This bit of fine-tuning produced Alpaca 7B, an LLM that opens up the knowledge encoded in the Llama LLM into something that the average person can access by asking questions and giving instructions. Some estimates suggest that the lightweight LLM can run on less than $600 worth of hardware.
Alpaca 7B’s creators are distributing the training set and the code that built it. Anyone can duplicate the model or create something new from a different set.
Vicuna
Another descendant of Llama is Vicuna from LMSYS.org. The Vicuna team gathered a training set of 70,000 different conversations from ShareGPT and paid particular attention to creating multi-round interactions and instruction-following capabilities. Available as either Vicuna-13b or Vicuna-7b, this LLM is among the most price-competitive open solutions for basic interactive chat.
NodePad
Not everyone is enthralled with the way that LLMs generate “linguistically accurate” text. The creators of NodePad believe that the quality of the text tends to distract users from double-checking the underlying facts. LLMs with nice UIs, “tend to unintentionally glorify the result making it more difficult for users to anticipate these problems.” NodePad is designed to nurture exploration and ideation without producing polished writing samples that users will barely skim. Results from this LLM appear as nodes and connections, like you see in many “mind mapping tools,” and not like finished writing. Users can tap the model’s encyclopedic knowledge for great ideas without getting lost in presentation.
Orca
The first generation of large language models succeeded by size, growing larger and larger over time. Orca, from a team of researchers at Microsoft, reverses that trend. The model uses only 13 billion parameters, making it possible to run on average machines. Orca’s developers achieved this feat by enhancing the training algorithm to use “explanation traces,” “step-by-step thought processes,” and “instructions.” Instead of just asking the AI to learn from raw material, Orca was given a training set designed to teach. In other words, just like humans, AIs learn faster when they’re not thrown into the deep end. The initial results are promising and Microsoft’s team offered benchmarks that suggest that the model performs as well as much larger models.
Jasper
The creators of Jasper didn’t want to build a wise generalist; they wanted a focused machine for creating content. Instead of just an open-ended chat session, the system offers more than 50 templates designed for particular tasks like crafting a real estate listing or writing product features for a site like Amazon. The paid versions are specifically aimed at businesses that want to create marketing copy with a consistent tone.
Claude
Anthropic created Claude to be a helpful assistant who can handle many of a business’s text-based chores, from research to customer service. In goes a prompt and out comes an answer. Anthropic deliberately allows long prompts to encourage more complex instructions, giving users more control over the results. Anthropic currently offers two versions: the full model called Claude-v1 and a cheaper, simplified one called Claude Instant, which is significantly less expensive. The first is for jobs that need more complex, structured reasoning while the second is faster and better for simple tasks like classification and moderation.
Cerebras
When specialized hardware and a general model co-evolve, you can end up with a very fast and efficient solution. Cerebras offers its LLM on Hugging Face in a variety of sizes from small (111 million parameters) to larger (13 billion parameters) for those who want to run it locally. Many, though, will want to use the cloud services, which run on Cerebras’s own wafer-scale integrated processors optimized for plowing through large training sets.
Falcon
The full-sized Falcon-40b and the smaller Falcon-7b were built by the Technology Innovation Institute (TII) in the United Arab Emirates. They trained the Falcon model on a large set of general examples from the RefinedWeb, with a focus on improving inference. Then, they turned around and released it with the Apache 2.0, making it one of the most open and unrestricted models available for experimentation.
ImageBind
Many think of Meta as a big company that dominates social media, but it’s also a powerful force in open source software development. Now that interest in AI is booming, it shouldn’t be a surprise that the company is starting to share many of its own innovations. ImageBind is a project that’s meant to show how AI can create many different types of data at once; in this case, text, audio, and video. In other words, generative AI can stitch together an entire imaginary world, if you let it.
Gorilla
You’ve probably been hearing a lot about using generative AI to write code. The results are often superficially impressive but deeply flawed on close examination. The syntax may be correct, but the API calls are all wrong, or they may even be directed at a function that doesn’t exist. Gorilla is an LLM that’s designed to do a better job with programming interfaces. Its creators started with Llama and then fine-tuned it with a focus on deeper programming details scraped directly from documentation. Gorilla’s team also offer its own API-centric set of benchmarks for testing success. That’s an important addition for programmers who are looking to rely on AIs for coding assistance.
Ora.ai
Ora is a system that allows users to create their own targeted chatbots that are optimized for a particular task. LibrarianGPT will try to answer any question with a direct passage from a book. Professor Carl Sagan, for example, is a bot that draws from all of Sagan’s writings so he can live on for billions and billions of years. You can create your own bot or use one of the hundreds created by others already.
AgentGPT
Another tool that stitches together all the code necessary for an application is AgentGPT. It’s designed to create agents that can be sent to tackle jobs like planning a vacation or write the code for a type of game. The source code for much of the tech stack is available under GPL 3.0. There’s also a running version available as a service.
FrugalGPT
This isn’t a different model as much as a careful strategy for finding the cheapest possible model to answer a particular question. The researchers who developed FrugalGPT recognized that many questions don’t need the biggest, most expensive model. Their algorithm starts with the simplest and moves up a list of LLMs in a cascade until it’s found a good answer. The researcher’s experiments suggest that this careful approach may save 98% of the cost because many questions do not actually need a sophisticated model.