The Problem

High test coverage numbers create false confidence. Teams chase 80% coverage by writing tests that call functions without asserting meaningful outcomes. Mocks replace so much of the system that tests only verify the mock setup. Test names like “should work” tell you nothing when they fail. Flaky tests get retried instead of fixed. The test suite runs green while bugs ship to production because the tests test nothing real.

The Prompt

Review the quality of the following test code. Act as a test engineering lead evaluating whether these tests actually catch bugs.

TEST FRAMEWORK: [e.g., Jest, Vitest, Pytest, Go testing]
CODE UNDER TEST: [brief description of what is being tested]

TEST CODE:
[paste your test files here]

SOURCE CODE (optional):
[paste the implementation being tested]

Evaluate across these test quality dimensions:

1. **Assertion Quality**
   - Do tests assert specific outcomes, not just "no error thrown"?
   - Are assertions testing behavior, not implementation details?
   - Would a subtle bug in the source code actually fail any of these tests?
   - Are negative cases tested (invalid input, error conditions)?

2. **Coverage Gaps**
   - What edge cases are missing (empty input, null, boundary values, large inputs)?
   - Are error paths tested (network failure, invalid data, permission denied)?
   - Is the happy path actually the riskiest path, or are edge cases?

3. **Test Independence**
   - Can each test run in isolation (no dependency on test execution order)?
   - Do tests clean up after themselves (no shared mutable state)?
   - Are there hidden dependencies between tests via global state?

4. **Mock Strategy**
   - Are mocks necessary, or could integration tests provide more confidence?
   - Do mocks accurately reflect the real dependency behavior?
   - Are too many things mocked (testing the mock, not the code)?
   - Are mock return values realistic or oversimplified?

5. **Readability & Naming**
   - Do test names describe the scenario AND expected outcome?
   - Is the Arrange/Act/Assert (AAA) pattern followed consistently?
   - Can you understand what failed from the test name alone?
   - Are test helpers extracting shared setup without hiding important context?

6. **Maintainability**
   - Are tests brittle (break on refactoring without behavior change)?
   - Is test data hardcoded or generated with factories/fixtures?
   - Are snapshot tests used appropriately (not for large objects)?

For each issue, provide:
- **Test**: Which test case
- **Problem**: Why this test provides false confidence
- **Severity**: false-positive (passes but should fail) / missing-coverage / fragile / style
- **Fix**: Improved test code

Example Output

## Test Quality Review: 4 issues found

### False Positive: Test Asserts Nothing Meaningful
Test: "should create a user"
Code:
  test('should create a user', async () => {
    const result = await createUser({ name: 'Test' });
    expect(result).toBeTruthy();  // Passes even if result is { error: true }
  });
Fix:
  test('creates user with correct name and generated ID', async () => {
    const result = await createUser({ name: 'Test' });
    expect(result.id).toMatch(/^usr_/);
    expect(result.name).toBe('Test');
    expect(result.createdAt).toBeInstanceOf(Date);
  });

### Missing Coverage: No Error Path Tests
Source: createUser() throws on duplicate email — no test verifies this.
Fix:
  test('throws DuplicateError when email already exists', async () => {
    await createUser({ email: 'a@b.com' });
    await expect(createUser({ email: 'a@b.com' }))
      .rejects.toThrow(DuplicateEmailError);
  });

### Fragile: Test Depends on Implementation Details
Test: Asserts mock was called with exact internal method sequence.
Fix: Assert on output behavior, not internal call order.

When to Use

Run this when reviewing test PRs, evaluating inherited test suites, or when test coverage is high but bugs still ship. Essential when preparing for a major refactor — fragile tests that break on non-behavioral changes will make refactoring miserable.

Pro Tips

Include the source code — test quality can only be evaluated against the implementation. Provide both for the most useful review.
Ask the mutation question — request “If I introduced a subtle off-by-one bug in line 15 of the source, which test would catch it?” to expose coverage gaps.
Review test-to-code ratio — if the test file is 3x longer than the source, the tests are likely over-specifying implementation details.

The Problem

The Prompt

Example Output

When to Use

Pro Tips

Related Skills

Code Readability Review

Error Handling Review