The Agentic Flattening Effect

Published on May 20, 2026

אִם יִרְצֶה הַשֵּׁם

We parsed every import statement in 37 Python repositories, 15 mature open-source projects and 22 publicly AI-attributed codebases, fitted distributions to their fan-in topology, and found something worth reporting: AI-generated codebases have measurably flatter internal structure than human-written ones. The effect is large (r = -0.744), statistically significant (p = 0.001), and three of 22 AI repos had zero internal coupling whatsoever. Every Python file was an island.

We call this the agentic flattening effect. This post is the accessible version of a forthcoming preprint (Bilar 2026q).

What Fan-In Measures

Fan-in is a simple count: for each file in a repository, how many other files in the same repo import it? A file imported by 40 others is a hub. A file imported by nothing is a leaf. The distribution of these counts across all files in a repo encodes the hierarchical coupling structure of the codebase.

In a mature, well-maintained project, this distribution is not random. A small number of central files (config, models, utilities) get imported heavily. A middle layer of services and adapters gets moderate use. A large fringe of leaf files (tests, scripts, endpoints) imports others but is imported by nothing. The shape, when plotted as rank vs. frequency in log-log space, traces a downward-opening parabola. That parabola is the signature of a log-normal distribution.

This is not accidental. It is the structural fingerprint of iterative human refactoring under cognitive pressure.

Why Log-Normal, Not Power-Law

Preferential attachment (Barabasi & Albert 1999) predicts power-law degree distributions in growing networks: the rich get richer, and the tail goes on forever as a straight line in log-log. But real codebases are finite, and human architects have finite attention. When a hub gets too many dependents, navigating it becomes expensive. The developer splits it. This cognitive pressure bends the tail downward, from power-law (straight line) toward log-normal (parabola).

Bejan's Constructal Law (1996, 2011) explains this from an optimization perspective: any finite flow system shaped by iterative improvement develops branching hierarchies that minimize resistance. In a codebase, the "flow" is execution paths and developer attention. The "resistance" is coupling complexity. The resulting topology has a characteristic log-normal signature because the optimization process is bounded, iterative, and shaped by human cognitive limits.

Clauset, Shalizi, and Newman (2009) showed that capacity limits cause real-world network distributions to decay faster than any pure power-law, making log-normal the better empirical fit. Package ecosystem studies confirm this: Decan et al. (2019) found Gini coefficients for dependency fan-in above 0.8 across seven ecosystems including PyPI and CRAN. Our baseline repos match: Gini mean 0.882.

The Data

Cohort A (baseline, human-written): Django, Flask, NumPy, pandas, SQLAlchemy, Celery, FastAPI, requests, pytest, Scrapy, httpx, Pydantic, aiohttp, Tornado, Click. All with 5+ years of development and 100+ contributors.

Cohort B (agentic, AI-generated or AI-assisted): 22 Python repositories created 2024-2026, each with at least one unambiguous AI-authorship signal: CLAUDE.md or .cursor/rules in the repo root, Co-authored-by: Cursor or noreply@anthropic.com in commit trailers, or explicit README self-attribution. These are applications built by AI agents, not AI tooling frameworks like LangChain.

Selection bias is real and acknowledged: these are publicly self-attributing repos, not representative of all AI-assisted code. They skew toward single-author, short-lifespan projects.

For each repo, fan-in was computed using Python's ast module (intra-repo imports only, external discarded). We fitted two models in log-log space via OLS: power-law ($\log v = a + b \log r$) and log-normal parabolic ($\log v = a + b \log r + c(\log r)^2$). Model selection by AIC and a Vuong-like test.

Results: The Gap

Group	n (fitted)	Gini mean	LN wins	Vuong z mean
Baseline (human-written OSS)	15	0.882	14/15	6.00
Agentic (AI-generated, 2024-2026)	12	0.725	11/12	2.77

Mann-Whitney U between groups:

Metric	p	rank-biserial r
Gini	0.0011	-0.744
Delta AIC (log-normal advantage)	0.019	-0.533
Normalized entropy	0.002	+0.711
Fraction leaf files	0.015	-0.556

Of 22 agentic repos cloned, 10 could not be fitted at all. Three had zero intra-repo coupling (dark-factory-experiment with 92 files, ott-platform with 88 files, OneResearchClaw with 30 files). Every Python file in those repos imports only external packages. The intra-repo dependency graph is literally the empty set. Four more had fewer than 10 connected files. One was too small. One had no Python files. One had a checkout failure.

The exclusions strengthen the finding. The most degenerate repos are below the measurement threshold, not in it. If you include the three Gini=0.000 repos alongside the 12 fitted ones, the agentic mean drops to approximately 0.580, a gap of 0.302 versus baseline.

Among the 12 fitted agentic repos, 11 still show log-normal shape. The parabola is present but shallow. Vuong z averages 2.77 versus 6.00 in baseline. The curvature is real but weak. Agentic repos do not switch to power-law; they shrink toward the power-law boundary. The parabola flattens without inverting.

Two Modes of Degeneration

The agentic group degenerates in two distinct forms:

Total isolation (Gini=0). Three repos have zero intra-repo fan-in. Every Python file is a leaf. Two of them are FastAPI backends with complete routing logic. The third is an autonomous research framework. All their coupling is to external libraries, not to each other. No baseline repo approached this state.

Partial flattening (lower Gini, attenuated parabola). The 12 fitted agentic repos have log-normal shape in 11 of 12 cases, but at 82% of baseline Gini. The intermediate branching layer is present but thinner. The middle of the hierarchy, the adapters, services, and shared utilities that sit between the trunk and the leaves, is compressed.

Both modes are consistent with the Constructal analogy: remove the optimization pressure, and the branching hierarchy degrades. The degree of degradation depends on how much coupling the project requires to function at all. A FastAPI backend can achieve full functionality with zero intra-repo coupling (just external library imports). A larger orchestration system (loki-mode, 531 files) cannot avoid some intermediate structure.

Why This Happens

The hypothesis is straightforward. A human developer who navigates a 10,000-line module feels the resistance: cognitive friction, context switching, risk of unintended side effects from coupling. That friction drives refactoring. The developer splits the bottleneck, introduces an intermediate abstraction, and the fan-in distribution moves toward its log-normal optimum. This is a feedback loop. The pain signal is real.

An LLM has no equivalent. Processing a 10,000-line file costs it nothing additional. There is no intrinsic pressure to split bottlenecks or build intermediate abstraction layers. Paipuru (2026) showed that agents without AST-derived graph access fail completely on tasks requiring structural dependency traversal, consistent with a mechanism in which structural blindness produces architectural drift: if you cannot feel the coupling pressure, you cannot refactor in response to it.

Mao et al. (2026) found the same flattening at the function level: AI-generated code is more verbose, has higher content ratio but lower lexical density, and a collapsed complexity distribution (uniform medium-sized blocks replacing the heavy-tailed distribution of human code). We are measuring the same phenomenon at the architectural level.

The Longitudinal Check: Does Adoption Flatten Existing Repos?

The cross-sectional result has an obvious confound: maybe AI repos and human repos just differ on age, size, contributors, and domain. To test this, we tracked fan-in topology monthly across Celery and Django before and after their documented transitions to AI-assisted coding.

Celery adopted Copilot on 2025-05-08 (first co-authored commit). Django added .github/copilot-instructions.md on 2026-03-05. We took monthly snapshots (25 pre-adoption + 12 post for Celery, 25 + 2 for Django) and computed Gini, Vuong z, and delta_AIC at each.

Repo	Last-6-pre Gini	First-6-post Gini	Delta	Wilcoxon p
Celery	0.8664	0.8680	+0.0016	0.031
Django	0.9327	0.9332	+0.0005	n/a (n=2)

No decline. Celery's Gini increased slightly, in the direction opposite to the flattening hypothesis. The log-normal signal strengthened post-adoption (Vuong z stepped from 6.55 to 7.94). Change-point detection found nothing attributable to AI adoption. The pre-adoption Gini drift (+0.003/year) continued unperturbed.

This is effectively a single-repo study (Celery provides 12 months of post-adoption data; Django only 2). Without matched controls, we cannot distinguish "AI adoption had no effect" from "Gini just drifts upward in mature repos regardless." But the null result is consistent with a specific interpretation: the flattening effect is a structural genesis problem, not a maintenance problem.

When an AI agent designs a new codebase from scratch, it has no existing hierarchy to follow. It generates files and imports based on immediate task context rather than accumulated architectural pressure. The result is flat. When an AI assistant contributes to an established codebase, it operates within the existing module structure. A Copilot suggestion in celery/app/trace.py imports what that file already imports. It does not spontaneously reorganize the import graph.

If this holds in a larger cohort, the structural risk of AI coding concentrates at project inception, not ongoing maintenance. The threat is in greenfield construction and wholesale rewrites, where no prior structure exists to constrain the agent.

So What: Practical Implications

Fan-in Gini is cheap to compute. It requires only a static AST parse of import statements. No tests to run. No execution needed. It can run in CI in seconds.

For teams adopting AI coding tools: Your existing codebase is probably fine. The architecture was built by humans under coupling pressure, and AI contributions follow the existing structure. Watch for Gini degradation if you do a major rewrite or start a new repo with heavy AI assistance.

For teams evaluating AI-generated code for the first time: A project with Gini below 0.75 and Vuong z below 2 is structurally flat. It may work. It may pass tests. But it lacks the layered organization that makes large codebases maintainable over time. Every future change must be made everywhere because there is no shared abstraction to update.

For CI pipelines: Gini is a useful one-shot diagnostic. For ongoing monitoring, Vuong z and delta_AIC are faster-moving and more sensitive to changes in the middle of the distribution, where the constructal hierarchy actually lives.

The threshold: If your repo's Gini is trending toward 0.6, your intermediate abstraction layer is dissolving. That is the structural equivalent of a city losing its arterial roads and sending all traffic directly from highways to driveways.

Limitations

Fan-in is first-order. It ignores call graphs, data flow, and semantic coupling. The agentic repos are publicly self-attributing outliers; private vibe-coded codebases may differ. FastAPI's design philosophy (thin routers, heavy external dependencies) may independently reduce intra-repo coupling regardless of authorship, and at least 3 of our Cohort B repos are FastAPI-style backends. The Constructal interpretation is analogical; Bejan's framework was not formally derived for discrete directed graphs. The longitudinal study is N=2 (effectively N=1). These are preliminary results. The genesis hypothesis is exactly that: a hypothesis to test against a larger cohort, not a settled conclusion.

What Comes Next

We need more repos in both directions: more agentic repos for the cross-sectional study (especially non-FastAPI projects with substantial internal coupling), and more mature repos with longer post-adoption windows for the longitudinal design. Celery at 12 months may not be long enough. Django at 2 months is certainly not. The volume hypothesis (AI contributions are just too small to move Gini in 12 months) remains open.

Reversibility is untested. Can a flat codebase be refactored into log-normal shape? The data to answer this would require repos with documented AI adoption followed by a return to pure human development. No such repos were identified.

The code and data will be available shortly at chokmah-me/parabolic-fractal. Everything reproduces from a pip install -r requirements.txt and four scripts.