Mar 27, 2026 AI

The AI Demo Looked Great. Then We Tried to Put It in Production.

All posts

I have spent the last few months building an AI document intelligence platform for a customer in a regulated industry. Not a pilot. Not a proof of concept. A production system with audit trails, tenant isolation, and data that absolutely cannot leak between clients.

The first problem I hit was one that anyone who has used ChatGPT or similar tools will recognise: the outputs varied significantly between similar questions. Same query, different answers. I assumed the model was the issue. It was not.

One Parameter Changed Everything

The root cause turned out to be a single API setting: the temperature parameter. It controls how much randomness the model introduces into its responses. Left at the default, the model behaves creatively. It explores different phrasings, offers varied perspectives, and generally acts like it is having a conversation. That is fine for a chatbot. It is not fine when a compliance team needs the same document analysed the same way every time.

Lowering the temperature produced output that felt significantly closer to deterministic. Consistent. Predictable. Much closer to what you would expect as a human reviewing the same document twice.

It was not a complicated fix. But that is precisely the point.

The Iceberg Below the Demo

Most organisations experience AI through demos and slides. The model is impressive. The possibilities feel endless. The board gets excited. But between a compelling demo and a system that can run in production, there is a gap that is rarely discussed honestly.

That gap is not about the AI model. It is about everything around it:

Why This Matters for Leaders

If you are a CEO or board member evaluating AI investments, this is worth understanding. The model is typically the easiest part of the project. The vendors will tell you it is plug and play. It is not.

The real work, and the real cost, sits in making the system production-grade. That means:

I have seen organisations spend months on model selection and days on production readiness. The ratio should be inverted.

A Practical Checklist

If you are running AI in production or planning to, here are the questions worth asking:

  1. Have you reviewed and tuned the temperature and other inference parameters for your specific use case?
  2. Is there complete tenant isolation if you are serving multiple clients?
  3. Do you have audit logs that a regulator could review?
  4. What happens when the model returns an incorrect or nonsensical response?
  5. Who owns the production configuration and how is it version-controlled?
  6. Have you load-tested the system under realistic conditions?

If any of these are unanswered, your AI system is a demo running in production. That is a risk, not a feature.

The Work That Matters

I am spending more of my time on this kind of work now, and I find it genuinely rewarding. Making AI systems production-ready rather than leaving them as demos and slides. It is less glamorous than model selection and prompt engineering, but it is where the real value is created and where the real risks are managed.

The AI model is the tip of the iceberg. The infrastructure, security, and configuration beneath the surface is what determines whether the system earns trust or erodes it.

Read next The Steak Test: A Simple Way to Check If Your AI Is Making Things Up