The Steak Test: A Simple Way to Check If Your AI Is Making Things Up
I build AI systems for regulated businesses where making things up is not an option. Financial services, legal, compliance. The kind of environments where a confident but fabricated answer can trigger regulatory action, mislead a board, or lose a client.
One of the first things I do when I build these systems is what I call the steak test.
The Steak Test
The concept is simple. You have an AI assistant sitting on top of your business data. It can answer questions about accounts, pull up reports, summarise documents. It looks impressive in a demo. But before you let it anywhere near real users, ask it something completely outside its domain.
Ask it how to cook a steak.
If your financial reporting tool gives you a recipe, you have a problem. Not a minor one. A fundamental one. It means the system is not grounded to your data. It is willing to answer anything, confidently, regardless of whether it has any basis for doing so.
This Is Not Hypothetical
I was talking to a COO recently who was rolling out an AI assistant on top of his business data. It pulled up the right numbers, answered questions about accounts. It looked great in the demo. Then he asked it something the system had no data for. It answered just as confidently. Really detailed, convincing, but completely made up.
That is the hallucination problem, and it is far more common than vendors will admit.
I tested this myself with Microsoft Copilot inside Excel. I had a sales forecast spreadsheet open. I asked Copilot how to cook a steak. Not only did it answer with detailed instructions, it offered follow-up suggestions about ribeye seasoning and side dishes. From inside a spreadsheet.
This is what an ungrounded AI system looks like in practice. It does not say "I don't know" or "that question is outside my scope." It gives you a confident, well-formatted answer to a question it has absolutely no business answering.
Temperature vs Grounding
There are two distinct problems here that are often confused:
- Temperature controls randomness. It determines how much variation the model introduces into its responses. High temperature means more creative, varied outputs. Low temperature means more consistent, deterministic outputs. I covered this in detail in my previous post about production AI.
- Grounding controls honesty. It determines whether the model restricts its answers to information it actually has access to, or whether it is willing to generate plausible-sounding responses about anything.
You can have perfect temperature settings and still have a grounding problem. A system that consistently, deterministically gives you a steak recipe from inside a financial tool is not better than one that gives you a different recipe each time. Both are wrong. The consistency just makes it harder to spot.
What Grounding Looks Like in Practice
A properly grounded AI system should do several things:
- Restrict its knowledge scope: It should only answer questions that can be answered from the data and documents it has been given access to.
- Refuse gracefully: When asked something outside its scope, it should say so clearly. Not apologetically, not with a workaround. Just a clear statement that the question is outside its domain.
- Cite its sources: When it does answer, it should point to the specific documents, records, or data that support the response.
- Flag uncertainty: If the available data is ambiguous or incomplete, the system should say so rather than filling in the gaps with generated content.
This is not easy to implement. It requires careful system prompting, retrieval architecture, and testing. But it is non-negotiable if you are deploying AI in a regulated environment.
The Questions Worth Asking
If you are evaluating an AI product or building one internally, here is what to test:
- Run the steak test. Ask it something completely outside the domain of your data. If it answers, your grounding is broken.
- Ask a question where the answer is "I don't know." Does the system admit uncertainty, or does it fabricate a plausible response?
- Check the citations. When the system gives an answer, can it point to the exact source? Can you verify it?
- Test the boundaries. Ask questions that are adjacent to your data but not directly answerable from it. These edge cases are where hallucinations hide.
- Try to mislead it. Tell it something false and see if it agrees. A well-grounded system should push back based on its actual data.
Why This Matters for Boards and CEOs
AI hallucination is not a technical curiosity. In a regulated environment, it is a compliance risk. In a financial context, it is a decision-making risk. In a customer-facing context, it is a reputational risk.
The vendors will show you the demo where everything works. Your job is to find the gaps. The steak test is the simplest, fastest way I have found to determine whether a system is genuinely grounded or just convincingly fluent.
Temperature controls randomness. Grounding controls honesty. If you are putting AI on top of business data, both need serious attention before it goes near real users.