Oct 20, 2025

How to Write Tests That Don't Suck, and how to Teach AI to Do the Same

The Problem: You have 11 tests. They all pass. Your coverage is at 95%. You feel accomplished. Then a bug slips through in production—one your tests should have caught. Sound familiar?

I recently spent an afternoon teaching an AI assistant how to write better tests for our React application. What started as a simple task—“add tests for this component”—turned into a masterclass in test quality over quantity. The kicker? By the end, we had fewer tests (3 instead of 11), better coverage, and a reusable AI agent that writes high-quality tests consistently.

Here’s what I learned about writing good tests, and more importantly, how to encode that knowledge so AI assistants actually help instead of just inflating your test count.

The “11 Tests Problem”

Let me show you what bad tests look like. The AI initially wrote this:

it('renders correct icon for AIRPORT type', () => {
  render(<Component type="AIRPORT" />);
  expect(screen.getByTestId('icon-plane')).toBeInTheDocument();
});

it('renders correct icon for SEAPORT type', () => {
  render(<Component type="SEAPORT" />);
  expect(screen.getByTestId('icon-ship')).toBeInTheDocument();
});

it('renders correct icon for TRAIN_STATION type', () => {
  render(<Component type="TRAIN_STATION" />);
  expect(screen.getByTestId('icon-tramfront')).toBeInTheDocument();
});

// ... 8 more nearly identical tests

Eleven tests. All green. But here’s the problem: they’re testing the same behavior with different inputs. This is test pollution—it makes your test suite slower, harder to maintain, and doesn’t actually improve quality.

The Five Principles of Good Tests

Through iterative feedback with the AI, we distilled test quality down to five core principles:

1. Group by Concern, Not by Input

Bad:

it('renders plane icon for AIRPORT', () => { ... });
it('renders ship icon for SEAPORT', () => { ... });
it('renders train icon for TRAIN_STATION', () => { ... });

Good:

it('renders correct icons for all location types', () => {
  const cases = [
    ['AIRPORT', 'icon-plane'],
    ['SEAPORT', 'icon-ship'],
    ['TRAIN_STATION', 'icon-tramfront'],
  ];

  cases.forEach(([type, expectedIcon]) => {
    const { unmount } = render(<Component type={type} />);
    expect(screen.getByTestId(expectedIcon)).toBeInTheDocument();
    unmount();
  });
});

This is data-driven testing. One test, multiple inputs. If the behavior is the same, the test should be the same.

The AI initially created three separate tests:

“should render LocationBlock components with correct props”
“should render separator between origin and destination”
“should apply correct layout classes”

These are all testing the same concern: visual structure. They belong together:

it('should render correct layout and styling', () => {
  const { container } = render(<Component />);

  // Check container
  const mainContainer = container.querySelector('.flex.flex-col');
  expect(mainContainer).toBeInTheDocument();

  // Check separator
  const separator = container.querySelector('hr');
  expect(separator).toHaveClass('text-gray-300');

  // Check location blocks
  const blocks = container.querySelectorAll('.location-block');
  expect(blocks).toHaveLength(2);
});

3. Apply Domain Knowledge

Here’s where it got interesting. The AI was dutifully testing every possible combination, including an edge case where both origin and destination were of type “ADDRESS”.

I told it: “We’ll never have ADDRESS-to-ADDRESS in our system.”

Immediately, the tests got simpler. No more special handling for impossible scenarios. No more conditional logic in tests.

The lesson: Don’t test mathematically possible combinations. Test business-realistic scenarios.

4. Test Behavior, Not Implementation

Bad:

expect(element).toHaveClass('flex', 'items-center', 'justify-between', 'px-4');

This breaks when you refactor CSS. You’re testing Tailwind classes, not actual behavior.

Good:

expect(screen.getByText('Delete')).toBeInTheDocument();
expect(button).toBeDisabled();

Test what users see and experience, not how you built it.

5. One Test = One Behavior

Each test should verify one logical behavior, but can have multiple assertions supporting that behavior.

it('renders origin and destination addresses', () => {
  render(<Component origin={mockOrigin} destination={mockDestination} />);

  // Both assertions support the same behavior: "displays addresses"
  expect(screen.getByText('123 Main St')).toBeInTheDocument();
  expect(screen.getByText('456 Park Ave')).toBeInTheDocument();
});

Don’t confuse “one behavior” with “one assertion.” It’s about logical cohesion.

The Real Magic: Teaching AI to Remember

Here’s the breakthrough: I didn’t just fix the tests. I created a specialized AI agent that encodes these principles.

In Claude Code (and similar AI coding tools), you can create specialized agents—markdown files that provide context-specific instructions. Here’s the structure I used:

# Test Writer Agent

## Purpose
Write high-quality, maintainable tests following project conventions.

## Core Principles
1. Group by concern, not by input variation
2. Combine related assertions
3. Apply domain knowledge...

## Examples
[Real examples from the codebase]

## Anti-Patterns to Avoid
[Specific bad patterns with explanations]

Now when I (or anyone on the team) needs to write tests, we invoke the Test Writer agent. It knows:

Our testing framework (Vitest + React Testing Library)
Our global mocks (lucide-react icons are automatically mocked)
Our data patterns (ADDRESS-to-ADDRESS never happens)
Our quality standards (quality over quantity)

The Results

Before:

11 tests for one component
Repetitive code
Testing impossible scenarios
Hard to maintain

After:

3 tests for the same component
Data-driven approach
Only realistic scenarios
Easy to understand and maintain

Same coverage. Better quality. Faster execution. More maintainable.

How to Build Your Own Test Writer Agent

Step 1: Audit Your Existing Tests

Look for patterns:

Are you repeating the same test with different inputs?
Do you have separate tests for related concerns?
Are you testing implementation details?
Do any tests check impossible scenarios?

Step 2: Extract Principles

From your audit, write down:

What makes a good test in your codebase?
What makes a bad test?
What’s unique about your project? (framework, mocks, domain rules)

Step 3: Create Examples

Include real before/after examples from your refactoring. Show:

Bad test → Why it’s bad → Good version
Use actual code from your project

Step 4: Document Anti-Patterns

List specific things to avoid:

## Anti-Patterns to Avoid

1. **Testing Implementation Details**
   ❌ Don't: `expect(element).toHaveClass('flex', 'items-center')`
   ✅ Do: `expect(button).toBeDisabled()`

2. **Over-Mocking**
   ❌ Don't mock the component you're testing
   ✅ Only mock external dependencies

Step 5: Add Domain-Specific Context

Include your project’s quirks:

“All icons are globally mocked with test IDs”
“We never have ADDRESS-to-ADDRESS combinations”
“Translations come from Storyblok, not locally”

This domain knowledge is what transforms generic testing advice into actionable, project-specific guidance.

The Template

Here’s a minimal template to get started:

# Test Writer Agent

## Tech Stack
- Framework: [Your framework]
- Testing Library: [Your testing lib]
- Mocking Strategy: [How you mock]

## Principles
1. [Your principle 1]
2. [Your principle 2]
...

## Patterns

### [Common Test Type 1]
```code
// Example

[Common Test Type 2]

// Example

Anti-Patterns

[Anti-pattern 1]
- Why it’s bad
- What to do instead

Domain Knowledge

[Unique constraint 1]
[Unique constraint 2]

Checklist

Tests use data-driven approach for variations
Related assertions grouped together
Only realistic scenarios tested
Behavior over implementation


## The Feedback Loop

The most important part? **Iterate with the AI**. When I reviewed the AI's initial tests, I didn't just fix them—I explained *why* they needed fixing:

- "Can you collapse these icon tests into a single smart test case?"
- "These layout checks can be combined—they're testing the same concern"
- "We'll never have ADDRESS-to-ADDRESS. Remove that case."

Each piece of feedback improved not just the tests, but my understanding of what to put in the agent documentation.

## The Real Win

Three months from now, when someone (human or AI) needs to write tests for a new component, they won't have to reinvent these principles. They won't go through the same "11 bad tests → 3 good tests" journey I did.

They'll have a guide. A specialized agent. A shared understanding of quality.

And that's worth more than any single test suite.

## Key Takeaways

1. **Quality > Quantity**: 3 well-structured tests beat 11 repetitive ones
2. **Data-driven testing**: Use arrays for input variations, not separate tests
3. **Domain knowledge matters**: Don't test impossible scenarios
4. **Test behavior, not implementation**: Focus on what users experience
5. **Encode your knowledge**: Create specialized AI agents with your principles
6. **Include real examples**: Before/after from actual refactoring
7. **Document anti-patterns**: Show what NOT to do with explanations
8. **Iterate and refine**: Each feedback session improves your agent

## Your Turn

Look at your test suite right now. How many tests are just the same test with different inputs? How many are testing implementation details? How many check scenarios that never happen in production?

You don't need to fix them all today. Start small:
1. Pick one component
2. Refactor its tests using these principles
3. Document what you learned
4. Create an agent for your team

Your future self (and your AI pair programmer) will thank you.

---

*The complete Test Writer agent from this article is available at [your-repo/.claude/agents/test-writer.md]. It's a living document that evolves with our codebase. Feel free to fork it and adapt it to your project.*

---

**About the Author**: I'm a developer who believes that good tests are like good documentation—they should tell a story, not just prove a point. When I'm not teaching AI assistants about test quality, I'm probably refactoring some other part of the codebase that "works fine but could be better."