insight
What I learned shipping agentic systems in production
18 April 20262 min read

Notes from eighteen months building agentic AI for a multi-brand operating business — what worked, what didn't, and where the wall actually is.
Most agentic AI write-ups are demos. This isn't. The systems below have been live, in production, inside an operating business with real customers and real revenue, for over a year.
What we shipped
The useful systems were not the cinematic ones. They were narrow, unglamorous workflows with a clear owner and a measurable hand-off.
- A content production pipeline that turns a brief, source material, and product context into a first draft for review.
- A pricing and margin workflow that spots anomalies across a multi-brand catalogue before they become expensive habits.
- A sales-research agent that builds, enriches, deduplicates, and scores target lists before a human decides what to trust.
- Internal copilots that answer operational questions from approved documents and system data, with clear source links and escalation paths.

The pattern was consistent: let the model draft, classify, retrieve, compare, and explain. Keep humans in charge of approval, customer promises, price changes, and anything that touches money or reputation.
What worked
The systems that stuck had small surfaces. One team, one workflow, one definition of success. They did not ask everyone in the business to change how they worked on day one.

The second thing that worked was treating retrieval and tools as product features, not plumbing. If the model could not show where an answer came from, or which system it had checked, the user did not trust it. The best interface was often not a chat box. Sometimes it was a scored queue, a draft response, a suggested price review, or a "check this before it goes out" panel.
The third thing was confidence scoring. A busy team does not need AI to sound clever. It needs to know which items can be skimmed, which need careful review, and which should be routed to a person immediately. That changed adoption more than model choice.
What didn't
The broad assistant idea did not work well. "Ask anything about the business" sounds attractive, but in practice it creates fuzzy expectations and too many unsafe edge cases. The more useful move was to pick one painful workflow and make the model excellent inside that fence.

Autonomy also hit a ceiling quickly. For customer-facing and commercial work, the right posture was draft, recommend, and explain. Fully autonomous action looked efficient in a demo and fragile in production. The first time a system sends the wrong answer to a customer, the time saved across the previous week stops mattering.
Data quality was the other constraint. The model could handle messy language. It could not magically fix missing ownership, stale product records, undocumented exceptions, or three teams using the same field differently. Agentic AI exposes the state of the business underneath it.
The wall
The hard part isn't the agent. The hard part is the data, the observability, and the change management around the agent. If you don't have those three, the agent is a demo, not a system.

More on each of those — and how to build them without a six-month detour — in upcoming posts.
Get Actionable AI in your inbox.
One practical AI play per issue. Sent occasionally, never filler.
Ready to use AI seriously?
A 30-minute call. No deck, no follow-up nurture sequence. I'll tell you whether I can help.