Mar 27, 2026 AI

The Steak Test: A Simple Way to Check If Your AI Is Making Things Up

All posts

I build AI systems for regulated businesses where making things up is not an option. Financial services, legal, compliance. The kind of environments where a confident but fabricated answer can trigger regulatory action, mislead a board, or lose a client.

One of the first things I do when I build these systems is what I call the steak test.

The Steak Test

The concept is simple. You have an AI assistant sitting on top of your business data. It can answer questions about accounts, pull up reports, summarise documents. It looks impressive in a demo. But before you let it anywhere near real users, ask it something completely outside its domain.

Ask it how to cook a steak.

If your financial reporting tool gives you a recipe, you have a problem. Not a minor one. A fundamental one. It means the system is not grounded to your data. It is willing to answer anything, confidently, regardless of whether it has any basis for doing so.

This Is Not Hypothetical

I was talking to a COO recently who was rolling out an AI assistant on top of his business data. It pulled up the right numbers, answered questions about accounts. It looked great in the demo. Then he asked it something the system had no data for. It answered just as confidently. Really detailed, convincing, but completely made up.

That is the hallucination problem, and it is far more common than vendors will admit.

I tested this myself with Microsoft Copilot inside Excel. I had a sales forecast spreadsheet open. I asked Copilot how to cook a steak. Not only did it answer with detailed instructions, it offered follow-up suggestions about ribeye seasoning and side dishes. From inside a spreadsheet.

Screenshot showing Microsoft Copilot in Excel answering 'how to cook a steak' with a detailed recipe, next to a sales forecast spreadsheet. The AI should have refused to answer a question unrelated to the data.

This is what an ungrounded AI system looks like in practice. It does not say "I don't know" or "that question is outside my scope." It gives you a confident, well-formatted answer to a question it has absolutely no business answering.

Temperature vs Grounding

There are two distinct problems here that are often confused:

You can have perfect temperature settings and still have a grounding problem. A system that consistently, deterministically gives you a steak recipe from inside a financial tool is not better than one that gives you a different recipe each time. Both are wrong. The consistency just makes it harder to spot.

What Grounding Looks Like in Practice

A properly grounded AI system should do several things:

This is not easy to implement. It requires careful system prompting, retrieval architecture, and testing. But it is non-negotiable if you are deploying AI in a regulated environment.

The Questions Worth Asking

If you are evaluating an AI product or building one internally, here is what to test:

  1. Run the steak test. Ask it something completely outside the domain of your data. If it answers, your grounding is broken.
  2. Ask a question where the answer is "I don't know." Does the system admit uncertainty, or does it fabricate a plausible response?
  3. Check the citations. When the system gives an answer, can it point to the exact source? Can you verify it?
  4. Test the boundaries. Ask questions that are adjacent to your data but not directly answerable from it. These edge cases are where hallucinations hide.
  5. Try to mislead it. Tell it something false and see if it agrees. A well-grounded system should push back based on its actual data.

Why This Matters for Boards and CEOs

AI hallucination is not a technical curiosity. In a regulated environment, it is a compliance risk. In a financial context, it is a decision-making risk. In a customer-facing context, it is a reputational risk.

The vendors will show you the demo where everything works. Your job is to find the gaps. The steak test is the simplest, fastest way I have found to determine whether a system is genuinely grounded or just convincingly fluent.

Temperature controls randomness. Grounding controls honesty. If you are putting AI on top of business data, both need serious attention before it goes near real users.

Read next The AI Demo Looked Great. Then We Tried to Put It in Production.