docs/adr/0011-proprietary-sigma0-base-model.md

ADR-0011: Own a proprietary Σ₀ base model — fork the PLT architecture, adapter-only weights, council + CSF native

Status

Proposed — awaiting approval from Alex Place.

Context

The owner's directive: a proprietary Σ₀ model — weights we adjust in future design, that serves the Σ₀ council and uses CSF. Today the local kernel tier leans on third-party checkpoints (Ouro, Qwen) plugged into local-model-registry.js. That satisfies "models are interchangeable" (ADR-0005) but leaves us unable to own the reasoning substrate: we cannot change a model's forward pass, and per ADR-0010 we may only adjust adapter weights — which presupposes a base whose modeling code we control.

A concrete starting point exists. LoopCoder-V2 (Multilingual-Multimodal-NLP/LoopCoder-V2, Apache-2.0) is a Parallel Loop Transformer (PLT): num_hidden_layers=14 physical layers executed plt_num_loops=2 times with shared weights, cross-loop processing, and per-head-gated mixed attention (global full + local sliding-window [64,0]). That looped-depth design is the same family as our self-converging kernel thesis (Ouro Q-exit, [[sigma0-coder-spiral-consolidation]]): recurrent compute is the Σ₀ Reason lever. But the public release is not usable as-is: the HF repo ships weights + config + tokenizer but no modeling_*.py, and config.json's auto_map wires only AutoConfig — so stock transformers cannot instantiate IQuestPLTCoderForCausalLM. The vendor's only serving path is a custom vLLM fork (yxing-bj/vllm, bf16, no quantization), which needs ≥24 GB VRAM and is a black box we cannot evolve. The 2026-06-29 on-box probe (experiments/loopcoder_v2_4bit_probe.py) correctly refused it (DONT_BUILD, [[loopcoder-v2-probe-failed]]).

The decision this forces: do we stay renters of third-party kernels forever, or do we own a Σ₀ base model — its architecture and its weights — as the local kernel tier?

Loop stages touched: Reason (a self-converging local kernel we control) and Converge (adapter-only learning from verified experience, already sanctioned by ADR-0010). Feature-gate check: this extends the existing local-model adapter + serving path and the existing CSF/memory substrate — it is not a new ecosystem, dream engine, or parallel memory system. One Convergence Core; one more interchangeable backend that happens to be ours.

Decision

We will build and own a proprietary Σ₀ base model — "Keystone-Σ₀" — by owning its modeling code, not by depending on any vendor's serving path.

  1. Own the architecture. We author our own modeling_keystone_plt.py implementing the PLT

forward (shared-layer loops + cross-loop processing + gated mixed attention) from the published config + paper (arXiv 2510.24824). Owning the forward pass is the prerequisite for everything else — adjustable weights, 4-bit fit, council hooks. The vendor vLLM fork is explicitly not adopted as the inference path (un-evolvable, ≥24 GB).

  1. Bootstrap weights legally, then make them ours. Initialize from LoopCoder-V2's Apache-2.0

checkpoint loaded through our modeling code. From that point the weights are a Keystone artifact we may adjust.

  1. **Weights are adjusted only via the ADR-0010

path — adapter-only, base frozen, verified-experience source-gate, collapse tripwire, reversible, operator-gated. "Weights adjusted in future design" means adapters over a frozen Keystone base**, never raw base-weight retraining. This ADR does not start training; it makes the base we own the thing those future adapters attach to.

  1. Council-native. The model serves the existing Σ₀ council (wired into autowork, #1598 /

[[dogfood-loop-reliable-and-council-wired]]) as a first-class member — its looped depth is the council's local Reason backend, behind the same verify gate as every other member.

  1. CSF-native. Memory/experience it reasons over is the one append-only JSONL + CSF archive

(ADR-0004, ADR-0003). Base and adapter checkpoints are content-addressed and archived in CSF — no new store.

  1. Interchangeable, not hardcoded. Keystone-Σ₀ registers in

local-model-registry.js as one more VRAM-gated, evidence-gated entry (ADR-0005). It LEADS only when a reproduced on-box eval beats the incumbent (Qwen2.5-Coder / the frontier coder). Until then it stays verified:false and cannot displace a known-good lead (External Reality Rule, #1597).

Consequences

  • Positive:
    • We own the reasoning substrate end-to-end — the precondition for the ADR-0010 flywheel and for any

Σ₀-specific architecture change (e.g. a trained Q-exit gate, [[ouro-adaptive-compute-gate]]).

  • Owning modeling.py lets us 4-bit the model into theGB box (the vendor path cannot), keeping

the kernel local (North Star principle 6).

  • Looped-depth kernel that's ours — aligns Reason with the Σ₀ self-converging thesis without a new

subsystem.

  • Negative / trade-offs:
    • Real reverse-engineering risk. A hand-written PLT forward must match the trained weights' exact

tensor layout and the gated-attention / cross-loop math, or outputs are garbage. This is unverified until it reproduces the vLLM reference (Stagebelow). Honest confidence today: medium-low on first-pass parity.

  • We take on model-maintenance debt (a modeling file, decode params, eval upkeep) we previously

rented. Mitigated by adapter-only + frozen base + CSF-archived, content-addressed checkpoints.

  • Bootstrapping from a third-party checkpoint inherits its license (Apache-2.0 — compatible) and its

biases until we adapt it.

  • Follow-ups (staged, each gated by on-box evidence — none auto-promotes the model):
    • Stagemodeling_keystone_plt.py— Parity. Author modeling_keystone_plt.py; load the forked weights; reproduce the

vLLM-fork reference logits/outputs on a fixed prompt set. Gate: token/logit parity within tolerance. Without this we own nothing — it is the first and blocking step.

  • Stage— Fit. 4-bit (bnb nf4) under theGB budget; measure VRAM + tok/s

(reuse loopcoder_v2_4bit_probe.py harness → data/convergence/).

  • Stage— Serve. Ollama/OpenAI-compatible endpoint (the ouro_serve.py pattern); point the

registry entry's endpoint at it.

  • Stage— Eval. eval_humaneval_chat.py head-to-head vs Qwen2.5-Coder on-box; only a win

flips verified:true with the measured capabilityScore.

  • Stage— Council + CSF. Register as a council member; archive base/adapter checkpoints in CSF.
  • Stage— Adapters. Only under the full ADR-0010 guardrail set, last.

Alternatives considered

  • Stay on third-party kernels (do nothing). Rejected by the directive — it forecloses owning the

reasoning substrate and the ADR-0010 flywheel. Legitimately the safe default; if this ADR is rejected, this is what we keep, and Qwen2.5-Coder remains the local lead.

  • Adopt the vendor vLLM fork as our serving path. Rejected — un-evolvable black box, bf16-only,

≥24 GB (won't fit theGB box), and gives us no ability to adjust the forward or quantize. Fails the "weights adjusted in future design" requirement.

  • Train a Σ₀ model from scratch. Rejected for now — orders of magnitude more compute than we have;

bootstrapping from an Apache-2.0 looped-coder checkpoint gets a capable base for the cost of a modeling file + parity work.

  • Build on Ouro instead of PLT. Not mutually exclusive — Ouro stays the registered recurrent-depth

research front. PLT is chosen as the bootstrap because a strong, permissively-licensed coder checkpoint already exists; a future ADR may converge the two looped families.

Evidence

Claim Evidence (file:line / commit / PR) Confidence Source
HF repo ships weights + config + tokenizer but no modeling_*.py; auto_map wires only AutoConfig HF API siblings list (3 safetensors shards + index, no modeling.py); local config.json auto_map High huggingface.co API + D:/hf-cache/.../config.json
PLT =shared layers × plt_num_loops=2, cross-loop processing + gated mixed attention configuration_iquestpltcoder.py:20-95 (docstring) High model repo
Arch params: hidden 5120, GQA 40/8, head_dim 128, SwiGLU/SiLU, RMSNorm, RoPE θ=500000, vocab 76800, ctx 131072, bf16 (~9B) config.json High model repo
Vendor's only serving path is a custom vLLM fork (yxing-bj/vllm), bf16, no quant README.md serve command High model card
On-box transformers load fails (Unrecognized configuration class … for AutoModelForCausalLM) → DONT_BUILD experiments/loopcoder_v2_4bit_probe.py; data/convergence/loopcoder-probe-log.jsonl High this repo, 2026-06-29
Adapter-only weight updates are the sanctioned future-weights path ADR-0010 High repo ADR
Model plugs in as one interchangeable, evidence-gated registry entry local-model-registry.js:130-148, #1597 High repo
Σ₀ council exists and runs on real decisions #1598, [[dogfood-loop-reliable-and-council-wired]] High repo
LoopCoder-V2 is Apache-2.0 (legal to fork weights) local-model-registry.js:147 note; model card license Med model card
Looped/recurrent depth is the Σ₀ Reason lever [[sigma0-coder-spiral-consolidation]], [[ouro-adaptive-compute-gate]] Med repo research