Skip to content

fix(skills): name test-setup pain as a design signal in testing-a-feature#8

Merged
sourcehawk merged 1 commit into
mainfrom
fix/test-setup-pain-signal
May 29, 2026
Merged

fix(skills): name test-setup pain as a design signal in testing-a-feature#8
sourcehawk merged 1 commit into
mainfrom
fix/test-setup-pain-signal

Conversation

@sourcehawk
Copy link
Copy Markdown
Owner

Problem

testing-a-feature already drives agents to escalate a vague contract — "fix the docstring first." But it said nothing about the opposite case: a contract that is perfectly clear, yet painful to test. So that signal was being missed.

A baseline run isolates the gap. Given a function with a complete, accurate docstring but heavy setup friction — four collaborators to stub, a mutable module-level global to reset between tests, an integer bitmask requiring a separate module to decode — the agent absorbed all of it into an elegant fixture, praised the fixture, and flagged no design problem at all. The mutable global, the hidden coupling, and the opaque bitmask passed unremarked.

Change

One anti-pattern and two red-flag rows in testing-a-feature that name test-setup pain as the surface's design talking, not a testing problem — and require surfacing it (fix the shape or the contract, or at minimum flag it as a finding on the PR) rather than growing the fixture until the pain disappears. The "or at minimum flag it" valve keeps it from being dogmatic under deadline.

Developed with superpowers:writing-skills (RED → GREEN)

  • RED — clear contract + painful setup: the agent absorbed every bit of friction into a fixture and escalated nothing.
  • GREEN — same scenario with the addition: the agent separated "fine to stub" from the smells, named the hidden global input, hit the cross-module tripwire on the bitmask, caught that the docstring was actually incomplete for the bitmask, and listed three design findings on the PR — while still shipping under deadline.

Note: earlier baselines with a vague docstring already passed (the existing "fix the docstring first" line fired), which is why the change targets only the clear-contract-painful-setup gap and nothing more.

🤖 Generated with Claude Code

…ture

The skill drove agents to escalate a vague *contract* (fix the docstring
first) but said nothing about a clear contract that is painful to *test*.
A baseline with a complete docstring and heavy setup — four collaborators
to stub, a mutable global to reset, a bitmask needing cross-module reading
— had the agent absorb all of it into a fixture and flag no design smell.

Add one anti-pattern and two red-flag rows that name test-setup pain as
the surface's design talking, and require surfacing it (fix the shape or
the contract, or at minimum flag it on the PR) rather than absorbing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 29, 2026 21:11
@sourcehawk sourcehawk merged commit 44c92d7 into main May 29, 2026
1 check failed
@sourcehawk sourcehawk deleted the fix/test-setup-pain-signal branch May 29, 2026 21:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the testing-a-feature skill guidance to treat “painful test setup” as a design smell (not something to hide behind increasingly elaborate fixtures), and adds corresponding red-flag prompts to ensure the issue is surfaced during review.

Changes:

  • Adds an anti-pattern explicitly calling out absorbing test-setup friction instead of surfacing underlying design issues.
  • Extends the “Red flags” table with prompts to escalate heavy setup and cross-module/implementation archaeology as contract/interface problems.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- **Copy-pasting an assertion "to match the file's style"** without re-verifying that the asserted behavior is what the contract under test actually promises.
- **One mega-test per function.** One assertion per intent. Mega-tests fail with one signal even when N intents are broken.
- **Testing private functions directly.** If the contract is private, the test is testing implementation. Drive the private code via the public surface.
- **Absorbing test-setup pain instead of surfacing it.** When standing the test up takes many collaborators stubbed, shared or global state to set and reset between tests, cross-module archaeology to construct a valid input, or reaching into internals to observe the result, that friction is the surface's design talking — not a testing problem. Building an ever-more-elaborate fixture hides the signal and locks the bad shape in. Surface it: fix the shape or the contract, or at minimum flag it as a finding on the PR. Do not ship the fixture as if the difficulty were normal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants