108,000 Lines of Code, 26,000 Lines of Docs: The Maintenance Reality of AI Agent "External Memory"

In February 2026, independent researcher Aristidis Vasilopoulos published "Codified Context: Infrastructure for AI Agents in a Complex Codebase." AI coding agents lose memory across sessions, forget project conventions, and repeat past mistakes. This paper is a field report on tackling that problem in a large-scale project through a three-tier documentation infrastructure.

What makes this paper particularly noteworthy is the author's background. Vasilopoulos's primary expertise is in chemistry, not software engineering. Using AI agents as the sole code generation tool, he built a 108,000-line C# distributed real-time system in 70 days of part-time development. What happens when a domain expert builds software outside their primary field with AI? This paper provides a data-backed record of both the outcomes and the challenges.

A Three-Tier "External Memory" Architecture

The paper proposes organizing AI-facing documentation into three tiers:

  • Tier 1: Constitution (Hot Memory) — A ~660-line Markdown file loaded automatically every session, containing coding conventions, naming rules, and task routing protocols
  • Tier 2: Specialist Agent Specs — 19 domain-specific agents totaling ~9,300 lines, embedding knowledge for areas like network synchronization and coordinate transforms
  • Tier 3: Knowledge Base (Cold Memory) — 34 subsystem specification documents totaling ~16,250 lines, retrieved on-demand via MCP keyword search

The three-tier separation is sensible. Always-needed information (Tier 1), task-specific expertise invoked per task (Tier 2), and detailed specs searched only when needed (Tier 3). It avoids bloating the context window while still supplying project-specific knowledge to the AI.

It Actually Worked: Four Case Studies

The paper reports four case studies drawn from 283 sessions (2,801 human prompts, 16,522 agent turns).

The save system specification, for instance, was referenced across 74 sessions over four weeks, and all five subsequent features touching persistence were implemented with consistent design. A UI synchronization pattern specification, distilled from prior trial-and-error, enabled the next networked feature to be implemented correctly on the first attempt.

The most striking example involves debugging a deterministic random number generator. After five context window exhaustions and 84 code edits, the bug remained unresolved — until a specialist agent with embedded domain knowledge was invoked. The pre-documented knowledge enabled root cause identification without re-deriving the theory from scratch during the session.

The evidence shows that documentation as external memory can work.

The Emerging Problem: Documentation Becomes "Second-Order Complexity"

But effectiveness came with a cost.

Image

Supporting 108,000 lines of C# required approximately 26,000 lines of documentation — a 24.2% code-to-doc ratio. This documentation wasn't auto-generated; the developer directed the AI to create and update it through manual orchestration. The author reports 1–2 hours per week of maintenance overhead.

More critically, the paper itself codifies this risk as Guideline G6: "Stale specs mislead efforts." On at least two occasions, outdated specifications caused the AI to apply deprecated design patterns, with errors surfacing only during testing. The author introduced a drift detection script to mitigate this — which itself became yet another artifact to maintain.

This creates a structural dilemma. Adding documentation to compensate for AI memory loss generates a new layer of complexity: keeping that documentation fresh. Tools to manage documentation produce further management overhead.

A Case Study in "Non-Engineers Building Software with AI"

Another valuable aspect of this paper is its role as a case study of a domain expert using AI to build software outside their primary expertise.

The fact that a chemist built a 108,000-line real-time distributed system using only AI agents genuinely commands respect. At the same time, it provides a data-backed record of what happens when software is developed with limited engineering experience.

The 24.2% documentation ratio can be read as a signal: the dynamics of compensating for limited design judgment with documentation volume. An experienced software engineer would carry much of this knowledge as tacit understanding; here, it all had to be explicitly codified and handed to the AI.

External Memory for AI Is Necessary — The Question Is How to Manage It

The insight from this paper is significant. For large-scale projects, a mechanism to supply project-specific knowledge to AI agents is essential. The finding that single-file manifests don't scale is backed by real-world practice data.

At the same time, the paper reveals the limits of managing documentation manually. The cost of keeping 26,000 lines of documentation current accelerates as projects grow.

Our sqlew's Approach

As it happens, sqlew is also being developed around the same theme this paper raises: context persistence for large-scale projects. In the next article, we'll introduce how sqlew approaches this challenge.


References

  • Vasilopoulos, A. (2026). "Codified Context: Infrastructure for AI Agents in a Complex Codebase" — arXiv:2602.20478 — https://doi.org/10.48550/arXiv.2602.20478
  • Lulla, J. L. et al. (2026). "On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents" — arXiv:2601.20404
  • Zhang, Q. et al. (2026). "Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models" — arXiv:2510.04618

sqlew OSS

  • Retain your projects' Memories
  • No external transaction
  • Open source & free forever
View on GitHub

sqlew Cloud

  • Team collaboratiom ready
  • Easy to setup, including audit features
  • 14 days Free trial available
Try for Free