Is Loading Everything into CLAUDE.md Really the Best Practice?

Project rules, architecture overviews, style guides, tool configurations — it's tempting to dump everything into CLAUDE.md, thinking "the AI will read it all." We get it.

Boris Cherny, creator of Claude Code, recommends using CLAUDE.md as a "failure notebook" for AI. Every time the AI makes a mistake, you write down "do it this way next time" and share it across the team — updating the file multiple times a week. The AI reads this notebook before starting any task, effectively "learning" your preferences and organizational rules over time. It sounds intuitively right, and agent developers actively encourage this approach, even providing /init commands to auto-generate context files. Over 60,000 GitHub repositories already include one. It's a standard practice.

But a study published in February 2026 by researchers at ETH Zurich throws cold water on this conventional wisdom. Their findings suggest that context files may actually be hurting agent performance.

The Study: First Rigorous Evaluation of Context Files

The paper "Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?" by Gloaguen et al. is the first study to rigorously evaluate how CLAUDE.md and AGENTS.md affect coding agents' ability to solve real-world tasks.

The research team evaluated performance across two benchmarks: the established SWE-bench Lite, plus a newly constructed benchmark called AGENTBENCH — comprising 138 tasks from 12 repositories where developers actively maintain context files.

Four coding agents (Claude Code, Codex, Qwen Code) paired with four LLMs were tested under three conditions:

  • None: No context file
  • LLM: Auto-generated context file using each agent's recommended method
  • Human: Developer-written context file

Auto-Generated Context Files Make Things Worse

The results contradicted many developers' intuitions.

LLM-generated context files reduced task success rates by 2–3% on average, underperforming the no-context-file baseline in 5 out of 8 settings. Meanwhile, inference costs increased by over 20%, and step counts rose across every single setting.

Developer-written context files showed a marginal improvement (averaging +4%), but still increased costs by up to 19% and added an average of 3.34 steps per task.

Condition Task Success Rate Change Inference Cost Change
LLM-generated -2–3% (avg) +20–23%
Developer-written +4% (avg) Up to +19%

Instructions Are Followed — The Problem Is Volume

Here's the interesting part: agents actually do follow context file instructions. When uv was mentioned in a context file, it was used an average of 1.6 times per instance — compared to fewer than 0.01 times when not mentioned. Repository-specific tools showed a similar 50x+ difference based on whether they appeared in the file.

So the performance drop isn't caused by agents ignoring instructions. It's caused by having too many instructions, making the task itself harder.

This is backed by the data: GPT-5.2's reasoning token count increased by 22% when context files were present. The agent itself was signaling that it perceived the task as more difficult, allocating more thinking resources accordingly.

Context Files Don't Work as Repository Overviews

Image

There's also the expectation that including a codebase overview in the context file helps agents navigate to relevant files faster. The research debunks this too.

The number of steps before an agent first touches a file included in the PR patch showed no meaningful reduction with context files — despite 100% of Sonnet-4.5-generated files containing repository overviews.

Worse, GPT-5.1 Mini was observed issuing multiple commands to locate context files and reading them repeatedly, even though they were already present in its context. In some cases, context files made agent behavior less efficient.

The One Condition Where Auto-Generated Files Help

There is one exception. When all documentation (README.md, docs/, example code) was stripped from repositories, LLM-generated context files improved performance by an average of 2.7% — even outperforming developer-written ones.

The implication is clear: auto-generated context files largely duplicate existing documentation. In well-documented repositories, they add nothing but noise. They only provide value where documentation is sparse.

Keep CLAUDE.md Minimal and Project-Specific

The research team's conclusion is unambiguous: CLAUDE.md and AGENTS.md should contain only minimal requirements, and LLM-generated context files should be avoided for now.

So what qualifies as "minimal requirements"? Combining these findings with existing research, the most effective content includes:

  • Project-specific build and test commands (npm test, uv run pytest, etc.)
  • Repository-specific tools and conventions (things not discoverable from README)
  • Essential coding style constraints (prohibited patterns, etc.)

What shouldn't be there: exhaustive architecture descriptions, design decision histories, file structure listings. These are either discoverable by the agent on its own or more efficiently served through dynamic retrieval rather than static injection.

Move Rules and History Outside CLAUDE.md

So where should critical development context — design rationale, project constraints, decision histories — actually live?

Boris Cherny's core idea — recording failures and sharing lessons across the team — is fundamentally sound. Learning from past mistakes is essential for any development team. The problem is that the accumulation target is a monolithic static file called CLAUDE.md.

What this research demonstrates is that "having" context and "leveraging" it are fundamentally different problems. Writing everything in CLAUDE.md doesn't help when the sheer volume degrades agent reasoning and inflates costs. As Chroma Research's "Context Rot" study shows, this is an inherent characteristic of LLMs.

sqlew takes the approach of externalizing this context. Design decisions and constraints are stored in a structured database, and AI agents retrieve only the context they need via MCP — just in time, just enough.

Keep CLAUDE.md to minimal "what to do" instructions. Delegate the "why" — design rationale and decision history — to external memory. From static full injection to dynamic selective retrieval. This research provides further evidence that this approach is the right direction.


References


sqlew OSS

  • Retain your projects' Memories
  • No external transaction
  • Open source & free forever
View on GitHub

sqlew Cloud

  • Team collaboratiom ready
  • Easy to setup, including audit features
  • 14 days Free trial available
Try for Free