methodology · open & living

Scores you can read.

Measuring how someone builds with AI is a brand-new field — so the scoring here isn't a black box. Every number is plain arithmetic over counted signals, the bands are research-anchored, and the whole contract is public, versioned, and yours to shape.

the six dimensions

What it measures

Each dimension is scored from counted signals in your own sessions and git history, against research-anchored bands — never a curve fit to one machine.

Signal Clarity

How precisely you direct the AI — prompt specificity, and how few iterations it takes to reach a usable result.

Build Stability

Whether AI-assisted code survives — churn, revert rate, and post-edit stability over time.

Decision Weight

The weight and durability of the technical decisions you make with the AI in the loop.

Recovery Velocity

How fast and how systematically you recover when the AI is wrong.

Context Command

How well you carry context across tools and sessions instead of starting cold each time.

Orchestration Range

How many tools, models, and agents you coordinate effectively — measured only when present.

Dimensions combine into nine archetypes and twelve crafts (plus the AI Explorer baseline everyone holds), placed on a build-domain × leverage map.

how the scoring works

Arithmetic, not vibes

counted, not guessed

Plain arithmetic

Every score is arithmetic over signals counted from your local data, mapped against research-anchored bands. The formulas are in scoring.py and published in full.

the line a model can't cross

No model writes a score

A model only ever writes the narrative of your own profile — it never assigns a number. Numbers are the engine's; words are optional and opt-in.

honest gaps

Insufficient, never estimated

Anything that can't be measured from your data is marked insufficient and excluded — your profile shows a completeness indicator instead of a fabricated number.

a map, not a ladder

No ranking, ever

No percentiles, cohorts, leaderboards, or "top X%". Positioning shows how you build. Different builder kinds are crafts, not rungs.

Calibrated for developers and AI engineers — people who build software with AI — and not yet for other kinds of AI builders. The methodology is versioned: any change ships as a transparent version bump.

open & living

The community co-owns it

As we collectively learn what "good" looks like, the methodology should learn with it. Three ways to shape it:

Debate it

Bring a case or a counterexample to Discussions → Methodology.

Propose a change

Open a PR against SCORING-METHODOLOGY.md with evidence or a good example. Accepted changes ship as a transparent methodology-version bump.

Start from the open questions

The open questions are the calibrations we're least sure of — the best place to push.

read the whole thing

Nothing is hidden

The full methodology is in the repo, and the tool renders it as an interactive explorer locally — the weights, the research behind each dimension, and how the weighting adapts to how you build.

SCORING-METHODOLOGY →Every formula and band, with provenance. DERIVATIONS →Per-metric reference: logic, reasoning, citations. REFERENCES →The formal bibliography behind the bands. TRUST →What a report proves, what it doesn't, what we won't do.

Run python3 -m nextmillionai report and open /methodology for the interactive explorer, served from your own machine.

Have a better calibration? The methodology improves when builders bring evidence.

Join the discussion → Get it free