Technology4 min read

Don't be Scared of Low Cost AI Models

Don't be scared of low-cost AI models. A blog article
MB
Written by Mark Barton

Another Day, Another AI Update

With the recent release of Deepseek AI's R1 model, a flood of speculation and bans followed. Everyone was asking the same question: how can a company provide such a high-performing model at such a low cost?

As the days passed, revelations about the true scale of Deepseek's operations became apparent. By using a "Mixture of Experts" architecture, R1 was able to dynamically scale its computational overhead based on the user's prompt. It also helped that the project was designed to disrupt the existing AI market, leading to significant price changes among major competitors.

OpenAI set the standard pricing for invoking an LLM. As more models entered the market, the focus remained on capability rather than cost. Now, as OpenAI's dominance wanes, competitors are seizing the opportunity to establish themselves in a new sector: low-cost LLMs.

From Mistral to Anthropic, AI companies are racing to develop lower-cost models to stay competitive in an ever-changing market. LLMs designed to be more deterministic in their approach are on the horizon. Here’s why this is good for everyone—especially developers.

Lower Cost LLMs Improve Adoption of AI

The biggest barrier to entry is price. To run an AI-heavy application, companies have offered credit schemes to offset the costs of using OpenAI and Anthropic models. While this helps with initial adoption, it limits users to a fixed number of requests per day. As LLMs become cheaper and token usage decreases, we will likely see a surge of new Agentic applications entering the market.

With more low-cost models available, developers can integrate AI tooling into their applications at a cost comparable to a Vercel Edge Function. This allows applications to offer more features without charging users extra for basic conveniences. It’s similar to the evolution of serverless architectures—no longer do we need to allocate static budgets for server resources. Instead, costs can scale up and down dynamically with minimal impact on our wallets.

Cheaper LLMs Drive Down Compute Costs

Vercel had recently release Fluid Compute

Vercel has recently announced a slew of price cuts. From Image Optimizations to Fluid Compute, developers can now build applications without worrying about spending hundreds of dollars just to host a handful of users. With cost-cutting measures focused on LLMs, we are entering a new era where hosting a web app with 100,000 users costs just a tenth of what it did two years ago.

Agentic Systems Can Use Low-Cost Models

At the heart of every Agentic system is a Reasoning Agent, responsible for determining the execution of a specific tool based on the user's input. These models act as routers within the system and do not require expensive, high-performance models to perform this task.

Due to the high volume of requests generated by these relatively simple systems, using a low-cost deterministic LLM for reasoning shifts the cost from inference to execution. Now, our ReAct agents can rely on a low-cost model to determine which tool to call, while focusing the costs on more specialized models for tasks such as image, voice, and video generation.

Don't Panic.

Sometimes it feels like the AI space is moving a million miles an hour. It's easy to feel overwhelmed, unproductive, or late to the party. But take comfort in the fact that most developers aren’t building their own models. As the market grows more competitive, the focus will likely shift from individual model capabilities to cheaper, faster models. User experience will be at the heart of all Agentic systems moving forward. At OMNIUX, we are working at the forefront of Agentic Systems development, designing platforms that feel like magic for our users. See how OMNIUX can help your business grow!