The problem

IBM's AI Adoption Index found that 39% of AI-powered customer service projects had to be reworked or rolled back in 2024. The reason was simple: the AI was making up answers that weren't true.

The pattern goes wider than chatbots. Tests of top AI tools on translation and language tasks showed they invent false information somewhere between 10% and 18% of the time. Roughly one answer in seven from a top-tier AI tool is wrong in some way.

For a small business deciding whether to put AI in front of customers, those numbers matter. A 1-in-7 wrong answer rate is the difference between a useful tool and an embarrassing mess.

Four questions to ask before trusting an AI

Before letting AI loose on real customers or using it for your business, four checks help:

  • Is it right? Does it give the correct answer to a straightforward question?
  • Is it consistent? Ask the same question ten times. Do you get the same answer each time?
  • Does it work in your situation? Specialist words, longer documents, or unusual file types: does the AI still cope?
  • Can you explain its answer? When something goes wrong, can you show why the AI said what it said?

Most off-the-shelf AI tools handle the first question well. Where they tend to fall down is the other three.

Asking more than one AI gives much better answers

Researchers have found a useful trick. Instead of trusting a single AI tool, you can ask 22 different ones the same question and go with the majority answer.

Quality scores jumped from 93-94 (for the best individual tool) to 98.5 (the group) on the same scale. The made-up-answer rate fell from 10-18% to under 2%.

What changed was the architecture around the AI. The tools stayed the same.

For a small business wanting to use AI the choice is: pay more for a single, more capable AI and accept it will still get things wrong sometimes, or have engineers design a system that catches some of those mistakes.

Where bespoke software earns its keep

General AI tools like ChatGPT work fine for things like writing first drafts or summarising notes. Mistakes made during those kind of tasks are easy to catch and fix.

The story changes when you need:

  • Consistent behaviour
  • A clear record of why the AI did what it did (for compliance, complaints, or to learn what went wrong)
  • Automation which needs to run without human oversight.

Bespoke software is fundamentally different from AI. It behaves consistently and reliably 24/7. This is why businesses have been running for decades on the same software without issue.

Any company planning to embed unreliable AI deep within it's business will be in for a shock, unless there are mechanisms in place to catch these errors.

At Synthetic Bytes we leverage AI to do the heavy lifting helping us to plan, design and write bespoke software much faster than any human can achieve. But the key design decisions, and QA are completed by an experienced engineer.

Sources