Skip to main content

Testing AI with AI: Navigating the Challenges of QA

Robert Salesas
Leapwork

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

In high-traffic environments, the sheer volume and unpredictable nature of network incidents can quickly overwhelm even the most skilled teams, hindering their ability to react swiftly and effectively, potentially impacting service availability and overall business performance. This is where closed-loop remediation comes into the picture: an IT management concept designed to address the escalating complexity of modern networks ...

In 2025, enterprise workflows are undergoing a seismic shift. Propelled by breakthroughs in generative AI (GenAI), large language models (LLMs), and natural language processing (NLP), a new paradigm is emerging — agentic AI. This technology is not just automating tasks; it's reimagining how organizations make decisions, engage customers, and operate at scale ...

In the early days of the cloud revolution, business leaders perceived cloud services as a means of sidelining IT organizations. IT was too slow, too expensive, or incapable of supporting new technologies. With a team of developers, line of business managers could deploy new applications and services in the cloud. IT has been fighting to retake control ever since. Today, IT is back in the driver's seat, according to new research by Enterprise Management Associates (EMA) ...

In today's fast-paced and increasingly complex network environments, Network Operations Centers (NOCs) are the backbone of ensuring continuous uptime, smooth service delivery, and rapid issue resolution. However, the challenges faced by NOC teams are only growing. In a recent study, 78% state network complexity has grown significantly over the last few years while 84% regularly learn about network issues from users. It is imperative we adopt a new approach to managing today's network experiences ...

Image
Broadcom

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...

Despite the frustrations, every engineer we spoke with ultimately affirmed the value and power of OpenTelemetry. The "sucks" moments are often the flip side of its greatest strengths ... Part 2 of this blog covers the powerful advantages and breakthroughs — the "OTel Rocks" moments ...

OpenTelemetry (OTel) arrived with a grand promise: a unified, vendor-neutral standard for observability data (traces, metrics, logs) that would free engineers from vendor lock-in and provide deeper insights into complex systems ... No powerful technology comes without its challenges, and OpenTelemetry is no exception. The engineers we spoke with were frank about the friction points they've encountered ...

Enterprises are turning to AI-powered software platforms to make IT management more intelligent and ensure their systems and technology meet business needs for efficiency, lowers costs and innovation, according to new research from Information Services Group ...

Testing AI with AI: Navigating the Challenges of QA

Robert Salesas
Leapwork

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

In high-traffic environments, the sheer volume and unpredictable nature of network incidents can quickly overwhelm even the most skilled teams, hindering their ability to react swiftly and effectively, potentially impacting service availability and overall business performance. This is where closed-loop remediation comes into the picture: an IT management concept designed to address the escalating complexity of modern networks ...

In 2025, enterprise workflows are undergoing a seismic shift. Propelled by breakthroughs in generative AI (GenAI), large language models (LLMs), and natural language processing (NLP), a new paradigm is emerging — agentic AI. This technology is not just automating tasks; it's reimagining how organizations make decisions, engage customers, and operate at scale ...

In the early days of the cloud revolution, business leaders perceived cloud services as a means of sidelining IT organizations. IT was too slow, too expensive, or incapable of supporting new technologies. With a team of developers, line of business managers could deploy new applications and services in the cloud. IT has been fighting to retake control ever since. Today, IT is back in the driver's seat, according to new research by Enterprise Management Associates (EMA) ...

In today's fast-paced and increasingly complex network environments, Network Operations Centers (NOCs) are the backbone of ensuring continuous uptime, smooth service delivery, and rapid issue resolution. However, the challenges faced by NOC teams are only growing. In a recent study, 78% state network complexity has grown significantly over the last few years while 84% regularly learn about network issues from users. It is imperative we adopt a new approach to managing today's network experiences ...

Image
Broadcom

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...

Despite the frustrations, every engineer we spoke with ultimately affirmed the value and power of OpenTelemetry. The "sucks" moments are often the flip side of its greatest strengths ... Part 2 of this blog covers the powerful advantages and breakthroughs — the "OTel Rocks" moments ...

OpenTelemetry (OTel) arrived with a grand promise: a unified, vendor-neutral standard for observability data (traces, metrics, logs) that would free engineers from vendor lock-in and provide deeper insights into complex systems ... No powerful technology comes without its challenges, and OpenTelemetry is no exception. The engineers we spoke with were frank about the friction points they've encountered ...

Enterprises are turning to AI-powered software platforms to make IT management more intelligent and ensure their systems and technology meet business needs for efficiency, lowers costs and innovation, according to new research from Information Services Group ...