insight
Which LLM is right for your business?
7 June 2026

The model names change quickly. The better question is what job the model needs to do, what risk it carries, and how the business will operate it.
Most businesses ask the wrong question first.
They ask: which LLM should we use?
It sounds sensible. The market is noisy. Every month there is a new benchmark, a new model family, a new context window, a new coding score, a new multimodal demo, a new pricing table, and a new person online declaring that everything else is now obsolete.
But for a business, the better first question is:
What job are we asking the model to do?
Until that is clear, model selection is theatre.
There is no best model
There are stronger models, cheaper models, faster models, safer models, more integrated models, and models that are better for specific tasks.
There is not one model that is simply "best" for every business.
A model that is excellent for long-form reasoning may be too slow or expensive for a high-volume support triage workflow. A model that is cheap enough for thousands of classifications may be too weak for complex contractual analysis. A model that performs well in a demo may not fit your data security, hosting, integration, audit, or procurement requirements.
The question is not "which model is cleverest?"
The question is "which model is good enough for this workflow, at this cost, with this risk profile, inside our operating constraints?"
That is a much more useful conversation.
Start with the workflow
I would split LLM use cases into a few broad categories.
Drafting and rewriting. Emails, proposals, web copy, product descriptions, internal comms, summaries, policies, sales follow-ups. This work needs fluency, tone control, and context. It rarely needs the most expensive reasoning model.
Retrieval and question answering. Asking questions of policies, product data, SOPs, tickets, ERP exports, CRM notes, or technical documents. The model matters, but retrieval quality matters more. If the source material is messy, stale, or badly chunked, the best model in the world will still give weak answers.
Classification and routing. Categorising enquiries, triaging tickets, tagging leads, detecting urgency, assigning work, scoring items for review. This usually rewards consistency, speed, low cost, and good evaluation more than raw intelligence.
Analysis and decision support. Comparing options, investigating anomalies, writing recommendations, reviewing risks, explaining financial or operational trade-offs. This needs stronger reasoning, clearer evidence, and a human owner.
Agentic workflows. Systems that use tools, call APIs, update records, draft outputs, check data, and move work through a process. Here the model is only one part of the system. The orchestration, permissions, logs, fallbacks, and review gates matter just as much.
If you cannot place your use case in one of those buckets, you probably do not understand it well enough to choose a model.
The practical model selection criteria
For most SMEs, I would score models against eight things.
Quality. Can it produce the level of output the workflow needs?
Reliability. Does it behave consistently across repeated runs, edge cases, and messy inputs?
Latency. Is it fast enough for the user experience?
Cost. Does the unit economics make sense at the volume you expect?
Context. Can it handle the documents, transcripts, records, or conversation history involved?
Tool use. Can it call the systems, functions, search indexes, or APIs required?
Data posture. Does it fit your confidentiality, residency, retention, and procurement requirements?
Integration. Can your team actually build, monitor, and support it?
The right answer is often a blend, not a single model.
Use a cheaper model for simple extraction. Use a stronger model for final reasoning. Use embeddings or search for retrieval. Use rules where rules are better. Use humans where judgement matters.
Good AI architecture is rarely "send everything to the biggest model".
Do not buy benchmarks blindly
Benchmarks are useful signals, but they are not your business.
Your business has specific documents, weird exceptions, old systems, product naming issues, customer edge cases, pricing rules, compliance needs, and team habits. A public benchmark does not know any of that.
I would rather run a small evaluation on twenty real examples from your business than argue about benchmark tables.
For example:
- ten real customer questions;
- five awkward support tickets;
- three messy product records;
- two decisions that require judgement.
Run the same examples through candidate models. Score the answers. Track factual accuracy, useful structure, tone, completeness, refusal behaviour, citation quality, latency, and cost.
You will learn more in an afternoon than from a week of model discourse.
The big enterprise question: platform or API?
There are two broad routes.
The first is to use an existing assistant product: ChatGPT Enterprise, Microsoft Copilot, Gemini for Workspace, Claude Team or Enterprise, or another managed tool. This is usually the fastest way to give people access to AI with some governance.
The second is to build a workflow using APIs. This is what you do when the model needs to sit inside a process: triaging leads, enriching data, drafting support responses, checking records, updating systems, or producing structured output.
Both have a place.
If the requirement is "help the team think, write, summarise, and analyse", start with a managed product.
If the requirement is "change how this workflow operates", build around an API.
Do not confuse the two. Giving everyone a chatbot is not the same as implementing AI in the business.
Where different models tend to fit
The exact model names will change, so I would avoid hard-coding a recommendation into the business case.
But the broad pattern is stable.
Some models are strongest for deep reasoning, complex analysis, and tool-heavy workflows. Some are better for fast, cheap, high-volume tasks. Some are particularly useful where the business is already committed to a productivity suite. Some are better for coding and technical work. Some are attractive when open-weight deployment, privacy, or control matters.
The point is not to develop religious loyalty.
The point is to match the model to the job.
For a customer-facing AI workflow, I care about reliability, grounding, escalation, and audit. For an internal drafting assistant, I care about tone, usability, and adoption. For a pricing anomaly workflow, I care about structured output, explainability, and false negatives. For a sales research workflow, I care about retrieval, deduplication, confidence scoring, and review.
Those are different buying decisions.
The risk question
Before choosing a model, ask what happens when it is wrong.
If it writes a clumsy first draft, the risk is low.
If it misclassifies a support ticket, the risk is moderate.
If it sends a customer an incorrect answer, changes a price, recommends legal action, exposes confidential data, or updates a system without review, the risk is high.
The higher the risk, the more you need:
- source grounding;
- human approval;
- logging;
- permission boundaries;
- confidence scoring;
- test sets;
- fallback paths;
- monitoring.
The model is not your control layer.
The system around the model is.
My default advice
For most SMEs, I would start with three tracks.
First, give the leadership and knowledge workers a governed assistant so they can learn what AI is good and bad at.
Second, pick one workflow with measurable friction and build a narrow AI system around it.
Third, create a small model evaluation set using your own data, then test candidate models against the work you actually need done.
That avoids the two common mistakes: endless model shopping and reckless deployment.
You do not need to choose the perfect LLM for the whole business.
You need to choose the right model for the next workflow, prove it works, and keep enough flexibility to change when the model landscape moves.
Because it will move.
The businesses that handle AI well will not be the ones that guessed the winning model in advance. They will be the ones that built enough judgement, evaluation, and operating discipline to keep choosing well as the tools change.
Get Actionable AI in your inbox.
One practical AI play per issue. Sent occasionally, never filler.
Ready to use AI seriously?
A 30-minute call. No deck, no follow-up nurture sequence. I'll tell you whether I can help.