In this segment of Joseph Ours’ Forbes Technology Council column, Joseph talks about the trade-offs between small language models and large language models.
The AI industry is obsessed with scale — bigger models, more parameters, higher costs — the assumption being that more always equals better. Today, small language models (SLM) are turning that assumption on its head, proving that when it comes to AI performance, size isn’t everything.
While organizations chase the latest large language model (LLM) with hundreds of billions of parameters, some are quietly deploying smaller, more specialized agents that deliver results for a fraction of the cost compared to their counterparts.
They may be on to something. We’ve seen that LLMs can, and do, deliver phenomenal results. However, using them for smaller tasks can be likened to using a Formula One race car for grocery shopping. Impressive, but inefficient and impractical for some real-world applications.
In fact, Gartner predicts that by 2027, small, task-specific AI models will be used three times more than general-purpose LLMs.
The combination of speed, cost-effectiveness and focused capability makes SLMs well-suited for specialized agentic systems, with AI agents designed to perform specific tasks autonomously within defined domains.
The Performance Trade-Off
LLMs are highly capable, revolutionary technology, but performance challenges are both real and measurable. Instead of broad general knowledge, SLM-powered agents focus on task-specific expertise.
This represents a performance trade-off between versatility and efficiency:
- Quality Of Input: LLMs excel at complex reasoning and sophisticated contextual understanding, handling diverse inputs across multiple domains. They’re ideal for strategic planning, creative content generation and sophisticated customer service requiring a nuanced understanding. However, their generalized training makes them capable of many things, but often not exceptional at specialized, industry-specific tasks.
- Cost And Speed: LLMs require thousands of GPUs, consume enormous energy, and carry operational costs that can reach hundreds of thousands of dollars monthly. SLM agents deliver dramatically lower costs — often 10 times less — with faster response times and superior performance in latency and throughput. They can also run locally without internet connectivity on edge devices like phones, infotainment systems and airport kiosks, but risk brittleness when encountering tasks outside their specialized scope.
The key is understanding when specialization outweighs versatility for your specific use case.
Real-World Agents
Like with LLMs, real-world SLM-powered agents are emerging. For example, Japan Airlines is using Microsoft’s Phi models to power AI agents that process passenger paperwork, reduce flight attendant workload and efficiently handle standardized passenger data and routine questions.
Potential exists in healthcare, where patient medication mix-ups happen frequently. SLM-powered agents could serve as specialized safety nets, with agents checking prescribed medications for potential interactions, dosage errors or prescription misinterpretations. Unlike comprehensive medical LLMs that might overstep boundaries, specialized agents could focus exclusively on medication information without venturing into diagnosis or treatment advice. Small model agents can be constrained to appropriate boundaries.
Outside of the more serious applications, gaming represents another emerging market. Instead of running expensive large language models to power non-player characters (NPCs) in games like GTA 6, studios could deploy specialized SLM-powered agents for NPC conversations, with each agent handling specific character types or conversation domains, dramatically improving customer experience while controlling costs.
Edge deployment enables these agentic applications to run locally on gaming devices without requiring constant cloud connectivity.
Implementation Hurdles Are Real
Just because they’re smaller doesn’t mean SLMs are automatically easier to implement.
At the consulting company where I work, we see organizations struggling to effectively govern and break down tasks for LLMs. If that’s the case, implementing specialized models becomes even more complex as they require precise task definition and strong governance frameworks. If they’re not well-managed, they’re more likely to go off track than LLMs.
Success for SLMs will require:
- Defined Performance Metrics: Clear measurement of response time, latency, tokens per second and accuracy in isolated domains where a variety of inputs is manageable.
- Domain Specificity: Applications like car infotainment systems that need to process natural language requests, healthcare charting, medication safety checks or other specialized agentic systems work best.
- Governance Maturity: Organizations that have yet to master LLM governance should focus on it first, as SLMs demand more precise oversight.
The Future Of Enterprise AI
The future of enterprise AI is about making intelligent choices that align AI capabilities with business requirements. While SLM agents will continue evolving for specialized tasks, LLMs remain essential for complex reasoning, creative work and scenarios requiring broad knowledge.
It turns out that in AI, big things really do come in small packages. As AI adoption picks up speed, organizations that understand that smaller can deliver better performance will gain competitive advantages.
This blog was originally published on Forbes.com.
Are you ready to explore how artificial intelligence can fit into your business but aren’t sure where to start? Our AI experts can guide you through the entire process, from planning to implementation. Talk to an expert