Recording design intent and feeding it to AI improves code quality. Our experiments confirmed this. But they also revealed something less intuitive.
When design intent records exist for only part of a codebase, AI experiences a unique kind of pressure.
The Gap Between Documented and Undocumented
Our controlled experiment included a condition where design intent was added mid-project. Partway through development, AI explored the existing codebase and extracted inferred design decisions into sqlew.
The results looked impressive at first glance. Evidence quality reached 98% of the from-inception condition (design intent recorded from the start), and prior-decision reference rate hit 207% of the no-intent condition (no design records at all).
But one metric was anomalous: test-related thinking density.
Test-Thinking Density at 254%: Traces of AI Anxiety
Analyzing AI reasoning patterns, the mid-project condition showed 60.41 test-related keywords per 10K characters — 254% of the from-inception condition's 23.76 and 153% of the no-intent condition's 39.39.
This doesn't just mean AI was writing too many tests. It was thinking about tests too much. When prompted to generate E2E tests, the mid-project condition produced 10 spec files with 25 tests versus the from-inception condition's 5 files with 16 tests. Despite the higher output, the number of genuinely useful tests was comparable. The excess was pure overhead.
Why did mid-project adoption alone cause this? The answer lies in the gap between documented and undocumented areas.
AI Treats Unmapped Territory as Risk
When ADRs are reverse-engineered from code, coverage is inherently uneven. Some modules have clearly readable design decisions; others are built on convention or tangled history. The result is patchy ADR coverage.
Consider this from the AI's perspective. Part of the project has records explaining "why this was implemented this way." Another part has nothing. AI cannot determine whether the absence means "we forgot to record it" or "it wasn't important enough to record."
So AI errs on the side of caution. In documented areas, it operates efficiently according to recorded decisions. In undocumented areas, it behaves defensively — writing extra tests to compensate for perceived risk. The 254% test-thinking density is the trace of this defensive behavior.
Notably, the no-intent condition didn't exhibit this pattern. With no design intent records at all, AI treats all areas equally. No gap means no area to be anxious about. Over-implementation was triggered not by the absence of design intent, but by its uneven presence.
The Fix: Fill Gaps Intentionally with Constraints
This problem is addressable through sqlew's Constraint feature.
If AI feels uncertain about undocumented areas, explicitly state "this is intentionally out of scope." Registering constraints like "areas without recorded decisions are currently out of scope" and "additional tests for existing features require a recorded decision" lets AI interpret gaps as deliberate exclusions rather than missing information.
The from-inception condition avoided this problem because decisions and constraints accumulated naturally together throughout development. Mid-project adoption requires consciously supplementing the "what we chose not to do" side.
Improving ADR Quality in Mid-Project Adoption
Code analysis alone cannot surface "what we chose not to do." But other information sources can.
Extracting from Design Documents and Meeting Notes
Design documents, existing ADRs, meeting notes, wikis, and READMEs often contain context that never appears in code — specifically, "why this approach was chosen" and "which alternatives were considered and rejected." These are exactly the raw materials for the constraints that mid-project adoption tends to lack.
When documentation is organized in specific directories, feeding it to AI in bulk is the most efficient approach.
Read the design documents in docs/ and wiki/,
and record this project's design decisions and rejected alternatives
as sqlew decisions and constraints.
Focus especially on "why that judgment was made"
and "what was decided not to do."
If meeting notes are available, the key is directing AI to look not just at what was decided, but at what was proposed and rejected. The reasons for rejection become the constraints that bound AI behavior.
Extracting from GitHub Issues & PRs
Using the gh CLI, you can feed Issue and Pull Request discussions directly to AI. PR review comments often preserve "why this implementation was chosen," while Issue discussions contain "alternatives considered but deferred" — complementing the design decision context that code alone cannot provide.
Use gh to retrieve past Issues and PRs,
and record this project's design decision history
as sqlew decisions and constraints.
Pay particular attention to design judgments discussed in PR review comments,
and alternatives considered but not adopted in Issues.
When the volume is large, filtering by labels or milestones improves precision.
Use gh to retrieve Issues labeled "architecture"
and PRs linked to the "v1.0" milestone,
and record the design decisions and constraints in sqlew.
By delegating this entire process to AI, mid-project adoption can produce practical ADRs. Code alone yields a map with only accelerators marked; adding documents and GitHub discussions fills in where the brakes are, creating a balanced design record.
Why From-Inception Adoption Remains Ideal
The most effective approach is still adopting sqlew from project inception. Decisions and constraints accumulate in balance throughout development, preventing patchy ADR coverage. AI operates with consistent precision across all areas, and test-thinking density stays within appropriate bounds.
"Confirm the effect with retroactive adoption, then adopt from inception for new projects." This is the practical migration path our experimental data supports.
References
- "Rediscovering Architectural Decision Records: How Persistent Design Context Improves LLM Code Generation" — Shingo Kitayama (2026) — sqlew Efficacy Study
- "Context Length Alone Hurts LLM Performance Despite Perfect Retrieval" — ACL Findings (EMNLP 2025)







