Skip to main content

Testing AI with AI: Navigating the Challenges of QA

Robert Salesas
Leapwork

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork

Hot Topics

The Latest

OpenTelemetry enjoys a positive perception, with half of respondents considering OpenTelemetry mature enough for implementation today, and another 31% considering it moderately mature and useful, according to a new EMA report, Taking Observability to the Next Level: OpenTelemetry's Emerging Role in IT Performance and Reliability ... and almost everyone surveyed (98.7%) express support for where OpenTelemetry is heading  ...

Image
EMA

If you've been in the tech space for a while, you may be experiencing some deja vu. Though often compared to the adoption and proliferation of the internet, Generative AI (GenAI) is following in the footsteps of cloud computing ...

Lose your data and the best case scenario is, well, you know the word — but at worst, it is game over. And so World Backup Day has traditionally carried a very simple yet powerful message for businesses: Backup. Your. Data ...

Image
World Backup Day

A large majority (79%) believe the current service desk model will be unrecognizable within three years, and nearly as many (77%) say new technologies will render it redundant by 2027, according to The Death (and Rebirth) of the Service Desk, a report from Nexthink ...

Open source dominance continues in observability, according to the Observability Survey from Grafana Labs.  A remarkable 75% of respondents are now using open source licensing for observability, with 70% reporting that their organizations use both Prometheus and OpenTelemetry in some capacity. Half of all organizations increased their investments in both technologies for the second year in a row ...

Significant improvements in operational resilience, more effective use of automation and faster time to market are driving optimism about IT spending in 2025, with a majority of leaders expecting their budgets to increase year-over-year, according to the 2025 State of Digital Operations Report from PagerDuty ...

Image
PagerDuty

Are they simply number crunchers confined to back-office support, or are they the strategic influencers shaping the future of your enterprise? The reality is that data analysts are far more the latter. In fact, 94% of analysts agree their role is pivotal to making high-level business decisions, proving that they are becoming indispensable partners in shaping strategy ...

Today's enterprises exist in rapidly growing, complex IT landscapes that can inadvertently create silos and lead to the accumulation of disparate tools. To successfully manage such growth, these organizations must realize the requisite shift in corporate culture and workflow management needed to build trust in new technologies. This is particularly true in cases where enterprises are turning to automation and autonomic IT to offload the burden from IT professionals. This interplay between technology and culture is crucial in guiding teams using AIOps and observability solutions to proactively manage operations and transition toward a machine-driven IT ecosystem ...

Gartner identified the top data and analytics (D&A) trends for 2025 that are driving the emergence of a wide range of challenges, including organizational and human issues ...

Traditional network monitoring, while valuable, often falls short in providing the context needed to truly understand network behavior. This is where observability shines. In this blog, we'll compare and contrast traditional network monitoring and observability — highlighting the benefits of this evolving approach ...

Testing AI with AI: Navigating the Challenges of QA

Robert Salesas
Leapwork

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork

Hot Topics

The Latest

OpenTelemetry enjoys a positive perception, with half of respondents considering OpenTelemetry mature enough for implementation today, and another 31% considering it moderately mature and useful, according to a new EMA report, Taking Observability to the Next Level: OpenTelemetry's Emerging Role in IT Performance and Reliability ... and almost everyone surveyed (98.7%) express support for where OpenTelemetry is heading  ...

Image
EMA

If you've been in the tech space for a while, you may be experiencing some deja vu. Though often compared to the adoption and proliferation of the internet, Generative AI (GenAI) is following in the footsteps of cloud computing ...

Lose your data and the best case scenario is, well, you know the word — but at worst, it is game over. And so World Backup Day has traditionally carried a very simple yet powerful message for businesses: Backup. Your. Data ...

Image
World Backup Day

A large majority (79%) believe the current service desk model will be unrecognizable within three years, and nearly as many (77%) say new technologies will render it redundant by 2027, according to The Death (and Rebirth) of the Service Desk, a report from Nexthink ...

Open source dominance continues in observability, according to the Observability Survey from Grafana Labs.  A remarkable 75% of respondents are now using open source licensing for observability, with 70% reporting that their organizations use both Prometheus and OpenTelemetry in some capacity. Half of all organizations increased their investments in both technologies for the second year in a row ...

Significant improvements in operational resilience, more effective use of automation and faster time to market are driving optimism about IT spending in 2025, with a majority of leaders expecting their budgets to increase year-over-year, according to the 2025 State of Digital Operations Report from PagerDuty ...

Image
PagerDuty

Are they simply number crunchers confined to back-office support, or are they the strategic influencers shaping the future of your enterprise? The reality is that data analysts are far more the latter. In fact, 94% of analysts agree their role is pivotal to making high-level business decisions, proving that they are becoming indispensable partners in shaping strategy ...

Today's enterprises exist in rapidly growing, complex IT landscapes that can inadvertently create silos and lead to the accumulation of disparate tools. To successfully manage such growth, these organizations must realize the requisite shift in corporate culture and workflow management needed to build trust in new technologies. This is particularly true in cases where enterprises are turning to automation and autonomic IT to offload the burden from IT professionals. This interplay between technology and culture is crucial in guiding teams using AIOps and observability solutions to proactively manage operations and transition toward a machine-driven IT ecosystem ...

Gartner identified the top data and analytics (D&A) trends for 2025 that are driving the emergence of a wide range of challenges, including organizational and human issues ...

Traditional network monitoring, while valuable, often falls short in providing the context needed to truly understand network behavior. This is where observability shines. In this blog, we'll compare and contrast traditional network monitoring and observability — highlighting the benefits of this evolving approach ...