Vibe Testing: The New AI Testing Paradigm

If you work in QA or software development, you've probably noticed: the way we write, maintain, and think about tests is changing fast. The term "vibe coding" — using AI to generate production code from natural language prompts — became mainstream in early 2025. But the testing side of that equation has been largely ignored.

That's the gap Vibe Testing fills. It's a framework I developed and documented in my book to formalize how QA engineers can leverage LLMs without losing control over test quality, coverage strategy, or engineering standards.

This article explains the core concepts, shows real implementation patterns with Playwright, and addresses the risks head-on.

Defining Vibe Testing

Vibe Testing is the disciplined practice of using AI assistants to generate, refine, and maintain automated tests — guided by human-defined quality criteria, risk models, and architectural constraints.

The key word is disciplined. This is not "ask ChatGPT to write your tests and commit them." That approach produces fragile, context-free tests that become maintenance debt within weeks.

Vibe Testing works when the engineer owns the strategy and the AI handles the scaffolding. Reverse that relationship, and you get noise instead of coverage.

In practice, this means the QA engineer defines what to test and why, while the AI accelerates the how: writing boilerplate, suggesting edge cases from API contracts, generating data-driven test variations, and proposing locator strategies.

The Three Technical Pillars

1. Context-Aware Test Generation

The quality of AI-generated tests is directly proportional to the quality of the context you provide. A prompt without architectural context produces generic tests. A prompt with your Page Object structure, naming conventions, and fixture patterns produces tests that look like your team wrote them.

Here's a concrete example. Consider a Playwright test for a telehealth appointment booking flow. The traditional approach:

// Traditional approach: manually authored end-to-end test
import { test, expect } from '@playwright/test';
import { AppointmentPage } from '../pages/AppointmentPage';
import { DashboardPage } from '../pages/DashboardPage';

test.describe('Appointment Booking', () => {
  test('patient completes booking and provider sees it', async ({ page }) => {
    const appointment = new AppointmentPage(page);
    const dashboard = new DashboardPage(page);

    await appointment.navigate();
    await appointment.fillPatientInfo({
      name: 'John Doe',
      state: 'California',
      insuranceId: 'BC-2024-11892'
    });
    await appointment.selectProvider('Dr. Sarah Chen');
    await appointment.selectTimeSlot('next-available');
    await appointment.submit();

    await expect(appointment.confirmationBanner).toBeVisible();
    await expect(appointment.confirmationId).not.toBeEmpty();

    // Verify provider-side visibility
    const confirmId = await appointment.confirmationId.textContent();
    await dashboard.navigateAsProvider('dr-chen');
    await expect(dashboard.findAppointment(confirmId)).toBeVisible();
  });
});

With Vibe Testing, you provide context to the AI through a structured prompt:

## Context
- Framework: Playwright + TypeScript
- Pattern: Page Object Model (see /pages/*.ts for conventions)
- Fixtures: use test.extend for authenticated sessions
- Naming: describe blocks use feature name, tests use user action

## Generate tests for: Appointment Booking
- Happy path: patient books, provider confirms visibility
- Edge case: expired insurance triggers validation
- Edge case: no available slots shows waitlist option
- Boundary: booking at 11:59 PM crosses to next day
- Negative: duplicate booking within 24h is rejected

The AI generates five test scaffolds that follow your exact patterns. You review, adjust assertions, and commit. Time saved: ~60% on initial scaffolding. But more importantly, the edge cases the AI suggests — like the midnight boundary — are scenarios teams often miss under delivery pressure.

2. Risk-Based Test Prioritization

Not all tests have equal value in every execution cycle. A git diff that touches the payment module should trigger payment-related tests first, not the entire regression suite.

AI models can analyze change sets against historical defect data to produce a risk-weighted execution order:

# risk-config.yml — AI-driven test prioritization
prioritization:
  strategy: risk-weighted
  inputs:
    - source: git-diff
      weight: 0.4
    - source: defect-history
      weight: 0.3
      lookback: 90d
    - source: code-complexity
      weight: 0.2
      metric: cyclomatic
    - source: last-failure
      weight: 0.1

  thresholds:
    critical:  0.8   # always run
    high:      0.6   # run on PR
    medium:    0.4   # run nightly
    low:       0.2   # run weekly

The practical result: regression feedback in 12 minutes instead of 45, because the suite runs the highest-risk tests first. If those pass, confidence is high enough to merge. The full suite still runs nightly for complete coverage.

3. Automated Test Maintenance

According to the ISTQB Test Automation Engineering syllabus, test maintenance accounts for 40-60% of total automation cost over a project's lifetime. Most of that cost comes from locator breakage when the UI changes.

AI-assisted maintenance works like this:

  1. Detection: CI pipeline fails, AI analyzes the error trace and the recent UI diff
  2. Classification: Is this a real bug or a locator change? AI compares the DOM before/after
  3. Proposal: If it's a locator change, AI generates a fix PR with updated selectors
  4. Validation: The engineer reviews the PR. One approval, zero manual debugging

This doesn't eliminate the engineer from the loop — it eliminates the tedious part of the loop.

Implementation at Scale

In high-stakes environments — healthcare, fintech, regulated industries — a test failure in production can have serious consequences for end users. That constraint shapes how Vibe Testing should be adopted in practice:

  • Structured prompt library: Maintain a shared repository of prompt templates aligned with your Playwright architecture. Every engineer uses the same patterns, so AI output is consistent across the team.
  • Human review gate: No AI-generated test reaches main without code review. The AI proposes, the engineer validates business logic, boundary conditions, and assertion quality.
  • Feedback loop: Tests that catch real defects in production should be tagged as high-value. This data trains your risk model, improving prioritization accuracy over time.
  • Guardrails: Explicitly prohibit AI-generated tests for compliance-sensitive flows without senior engineer review. Regulatory-critical paths require manual authorship.

Common Misconceptions

After presenting this framework at conferences and in my QA courses at UPC, these are the most frequent objections I hear — and the honest answers:

"AI will replace QA engineers."
No. AI replaces the mechanical part of test writing. The strategic part — deciding what risks to cover, what level of testing is appropriate, how to structure a quality gate — requires human judgment, domain knowledge, and organizational context that LLMs don't have.

"AI-generated tests are unreliable."
Unreliable without review, yes. That's why Vibe Testing mandates a human validation step. The AI output is a first draft, not a final artifact. Treat it like you'd treat a junior engineer's PR: review it thoroughly.

"This only works with simple CRUD apps."
I've applied it in environments with complex state machines, multi-step workflows, and regulatory constraints. The key is providing sufficient context in your prompts. Simple prompts produce simple tests.

Getting Started: A Practical Checklist

If you want to adopt Vibe Testing incrementally, start here:

  1. Document your test architecture — Page Object conventions, fixture patterns, naming standards. This becomes your AI context.
  2. Pick one stable feature — Choose a well-understood module with existing test coverage. Generate AI tests and compare them against your manual ones.
  3. Establish the review gate — Define explicit criteria: what makes an AI-generated test acceptable? Codify this in your PR template.
  4. Measure the delta — Track: time to write, defects caught, false positive rate, maintenance cost. Compare AI-assisted vs. manually-authored tests over 2-3 sprints.
  5. Expand with guardrails — As confidence grows, extend to more modules. Keep compliance-critical and security-sensitive areas under manual authorship.

The complete framework — including ready-to-use prompt templates for Playwright, Cypress, and API testing — is documented in detail in my book Quality Assurance: De Fundamentos a Automatización con IA (Amazon, Kindle & Paperback).


Vibe Testing is not about replacing engineering discipline with AI convenience. It's about applying AI where it genuinely reduces toil — test scaffolding, locator maintenance, edge case discovery — while keeping strategic decisions where they belong: with the engineer. The teams that get this balance right will ship faster without shipping recklessly.

Share this article

Was this article helpful?

Thanks for your feedback!

4.4 / 5 · 89 ratings
References

All information we provide is backed by authoritative and up-to-date bibliographic sources, ensuring reliable content in line with our editorial principles.

  • Tao, C., & Gao, J. (2024). AI-Assisted Testing: Emerging Practices and Challenges. IEEE Software, 41(2), 34-42.
  • ISTQB. (2024). ISTQB Certified Tester AI Testing (CT-AI) Syllabus. https://www.istqb.org/
  • Ministry of Testing. (2024). The Future of Software Testing with AI. https://www.ministryoftesting.com/

How to cite this article

Citing original sources serves to give credit to corresponding authors and avoid plagiarism. It also allows readers to access the original sources to verify or expand information.

Support My Work

If you found this useful, consider leaving a comment on LinkedIn or buying me a coffee/tea. It helps me keep creating content like this.

Comments

0 comments
0 / 1000

As an Amazon Associate I earn from qualifying purchases.

Back to Blog