fix(skills): name test-setup pain as a design signal in testing-a-feature#8
Merged
Merged
Conversation
…ture The skill drove agents to escalate a vague *contract* (fix the docstring first) but said nothing about a clear contract that is painful to *test*. A baseline with a complete docstring and heavy setup — four collaborators to stub, a mutable global to reset, a bitmask needing cross-module reading — had the agent absorb all of it into a fixture and flag no design smell. Add one anti-pattern and two red-flag rows that name test-setup pain as the surface's design talking, and require surfacing it (fix the shape or the contract, or at minimum flag it on the PR) rather than absorbing it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the testing-a-feature skill guidance to treat “painful test setup” as a design smell (not something to hide behind increasingly elaborate fixtures), and adds corresponding red-flag prompts to ensure the issue is surfaced during review.
Changes:
- Adds an anti-pattern explicitly calling out absorbing test-setup friction instead of surfacing underlying design issues.
- Extends the “Red flags” table with prompts to escalate heavy setup and cross-module/implementation archaeology as contract/interface problems.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - **Copy-pasting an assertion "to match the file's style"** without re-verifying that the asserted behavior is what the contract under test actually promises. | ||
| - **One mega-test per function.** One assertion per intent. Mega-tests fail with one signal even when N intents are broken. | ||
| - **Testing private functions directly.** If the contract is private, the test is testing implementation. Drive the private code via the public surface. | ||
| - **Absorbing test-setup pain instead of surfacing it.** When standing the test up takes many collaborators stubbed, shared or global state to set and reset between tests, cross-module archaeology to construct a valid input, or reaching into internals to observe the result, that friction is the surface's design talking — not a testing problem. Building an ever-more-elaborate fixture hides the signal and locks the bad shape in. Surface it: fix the shape or the contract, or at minimum flag it as a finding on the PR. Do not ship the fixture as if the difficulty were normal. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
testing-a-featurealready drives agents to escalate a vague contract — "fix the docstring first." But it said nothing about the opposite case: a contract that is perfectly clear, yet painful to test. So that signal was being missed.A baseline run isolates the gap. Given a function with a complete, accurate docstring but heavy setup friction — four collaborators to stub, a mutable module-level global to reset between tests, an integer bitmask requiring a separate module to decode — the agent absorbed all of it into an elegant fixture, praised the fixture, and flagged no design problem at all. The mutable global, the hidden coupling, and the opaque bitmask passed unremarked.
Change
One anti-pattern and two red-flag rows in
testing-a-featurethat name test-setup pain as the surface's design talking, not a testing problem — and require surfacing it (fix the shape or the contract, or at minimum flag it as a finding on the PR) rather than growing the fixture until the pain disappears. The "or at minimum flag it" valve keeps it from being dogmatic under deadline.Developed with
superpowers:writing-skills(RED → GREEN)Note: earlier baselines with a vague docstring already passed (the existing "fix the docstring first" line fired), which is why the change targets only the clear-contract-painful-setup gap and nothing more.
🤖 Generated with Claude Code