If you work in QA or software development, you've probably noticed: the way we write, maintain, and think about tests is changing fast. The term "vibe coding" — using AI to generate production code from natural language prompts — became mainstream in early 2025. But the testing side of that equation has been largely ignored.
That's the gap Vibe Testing fills. It's a framework I developed and documented in my book to formalize how QA engineers can leverage LLMs without losing control over test quality, coverage strategy, or engineering standards.
This article explains the core concepts, shows real implementation patterns with Playwright, and addresses the risks head-on.
Defining Vibe Testing
Vibe Testing is the disciplined practice of using AI assistants to generate, refine, and maintain automated tests — guided by human-defined quality criteria, risk models, and architectural constraints.
The key word is disciplined. This is not "ask ChatGPT to write your tests and commit them." That approach produces fragile, context-free tests that become maintenance debt within weeks.
Vibe Testing works when the engineer owns the strategy and the AI handles the scaffolding. Reverse that relationship, and you get noise instead of coverage.
In practice, this means the QA engineer defines what to test and why, while the AI accelerates the how: writing boilerplate, suggesting edge cases from API contracts, generating data-driven test variations, and proposing locator strategies.
The Three Technical Pillars
1. Context-Aware Test Generation
The quality of AI-generated tests is directly proportional to the quality of the context you provide. A prompt without architectural context produces generic tests. A prompt with your Page Object structure, naming conventions, and fixture patterns produces tests that look like your team wrote them.
Here's a concrete example. Consider a Playwright test for a telehealth appointment booking flow. The traditional approach:
// Traditional approach: manually authored end-to-end test
import { test, expect } from '@playwright/test';
import { AppointmentPage } from '../pages/AppointmentPage';
import { DashboardPage } from '../pages/DashboardPage';
test.describe('Appointment Booking', () => {
test('patient completes booking and provider sees it', async ({ page }) => {
const appointment = new AppointmentPage(page);
const dashboard = new DashboardPage(page);
await appointment.navigate();
await appointment.fillPatientInfo({
name: 'John Doe',
state: 'California',
insuranceId: 'BC-2024-11892'
});
await appointment.selectProvider('Dr. Sarah Chen');
await appointment.selectTimeSlot('next-available');
await appointment.submit();
await expect(appointment.confirmationBanner).toBeVisible();
await expect(appointment.confirmationId).not.toBeEmpty();
// Verify provider-side visibility
const confirmId = await appointment.confirmationId.textContent();
await dashboard.navigateAsProvider('dr-chen');
await expect(dashboard.findAppointment(confirmId)).toBeVisible();
});
});
With Vibe Testing, you provide context to the AI through a structured prompt:
## Context
- Framework: Playwright + TypeScript
- Pattern: Page Object Model (see /pages/*.ts for conventions)
- Fixtures: use test.extend for authenticated sessions
- Naming: describe blocks use feature name, tests use user action
## Generate tests for: Appointment Booking
- Happy path: patient books, provider confirms visibility
- Edge case: expired insurance triggers validation
- Edge case: no available slots shows waitlist option
- Boundary: booking at 11:59 PM crosses to next day
- Negative: duplicate booking within 24h is rejected
The AI generates five test scaffolds that follow your exact patterns. You review, adjust assertions, and commit. Time saved: ~60% on initial scaffolding. But more importantly, the edge cases the AI suggests — like the midnight boundary — are scenarios teams often miss under delivery pressure.
2. Risk-Based Test Prioritization
Not all tests have equal value in every execution cycle. A git diff that touches the payment module should trigger payment-related tests first, not the entire regression suite.
AI models can analyze change sets against historical defect data to produce a risk-weighted execution order:
# risk-config.yml — AI-driven test prioritization
prioritization:
strategy: risk-weighted
inputs:
- source: git-diff
weight: 0.4
- source: defect-history
weight: 0.3
lookback: 90d
- source: code-complexity
weight: 0.2
metric: cyclomatic
- source: last-failure
weight: 0.1
thresholds:
critical: 0.8 # always run
high: 0.6 # run on PR
medium: 0.4 # run nightly
low: 0.2 # run weekly
The practical result: regression feedback in 12 minutes instead of 45, because the suite runs the highest-risk tests first. If those pass, confidence is high enough to merge. The full suite still runs nightly for complete coverage.
3. Automated Test Maintenance
According to the ISTQB Test Automation Engineering syllabus, test maintenance accounts for 40-60% of total automation cost over a project's lifetime. Most of that cost comes from locator breakage when the UI changes.
AI-assisted maintenance works like this:
- Detection: CI pipeline fails, AI analyzes the error trace and the recent UI diff
- Classification: Is this a real bug or a locator change? AI compares the DOM before/after
- Proposal: If it's a locator change, AI generates a fix PR with updated selectors
- Validation: The engineer reviews the PR. One approval, zero manual debugging
This doesn't eliminate the engineer from the loop — it eliminates the tedious part of the loop.
Implementation at Scale
In high-stakes environments — healthcare, fintech, regulated industries — a test failure in production can have serious consequences for end users. That constraint shapes how Vibe Testing should be adopted in practice:
- Structured prompt library: Maintain a shared repository of prompt templates aligned with your Playwright architecture. Every engineer uses the same patterns, so AI output is consistent across the team.
- Human review gate: No AI-generated test reaches
mainwithout code review. The AI proposes, the engineer validates business logic, boundary conditions, and assertion quality. - Feedback loop: Tests that catch real defects in production should be tagged as high-value. This data trains your risk model, improving prioritization accuracy over time.
- Guardrails: Explicitly prohibit AI-generated tests for compliance-sensitive flows without senior engineer review. Regulatory-critical paths require manual authorship.
Common Misconceptions
After presenting this framework at conferences and in my QA courses at UPC, these are the most frequent objections I hear — and the honest answers:
"AI will replace QA engineers."
No. AI replaces the mechanical part of test writing. The strategic part — deciding what risks to cover, what level of testing is appropriate, how to structure a quality gate — requires human judgment, domain knowledge, and organizational context that LLMs don't have.
"AI-generated tests are unreliable."
Unreliable without review, yes. That's why Vibe Testing mandates a human validation step. The AI output is a first draft, not a final artifact. Treat it like you'd treat a junior engineer's PR: review it thoroughly.
"This only works with simple CRUD apps."
I've applied it in environments with complex state machines, multi-step workflows, and regulatory constraints. The key is providing sufficient context in your prompts. Simple prompts produce simple tests.
Getting Started: A Practical Checklist
If you want to adopt Vibe Testing incrementally, start here:
- Document your test architecture — Page Object conventions, fixture patterns, naming standards. This becomes your AI context.
- Pick one stable feature — Choose a well-understood module with existing test coverage. Generate AI tests and compare them against your manual ones.
- Establish the review gate — Define explicit criteria: what makes an AI-generated test acceptable? Codify this in your PR template.
- Measure the delta — Track: time to write, defects caught, false positive rate, maintenance cost. Compare AI-assisted vs. manually-authored tests over 2-3 sprints.
- Expand with guardrails — As confidence grows, extend to more modules. Keep compliance-critical and security-sensitive areas under manual authorship.
The complete framework — including ready-to-use prompt templates for Playwright, Cypress, and API testing — is documented in detail in my book Quality Assurance: De Fundamentos a Automatización con IA (Amazon, Kindle & Paperback).
Vibe Testing is not about replacing engineering discipline with AI convenience. It's about applying AI where it genuinely reduces toil — test scaffolding, locator maintenance, edge case discovery — while keeping strategic decisions where they belong: with the engineer. The teams that get this balance right will ship faster without shipping recklessly.
Comments
0 commentsAll comments are moderated and will appear after review.