ADR-0011: Own a proprietary Σ₀ base model — fork the PLT architecture, adapter-only weights, council + CSF native
Status
Proposed — awaiting approval from Alex Place.
Context
The owner's directive: a proprietary Σ₀ model — weights we adjust in future design, that serves the Σ₀ council and uses CSF. Today the local kernel tier leans on third-party checkpoints (Ouro, Qwen) plugged into local-model-registry.js. That satisfies "models are interchangeable" (ADR-0005) but leaves us unable to own the reasoning substrate: we cannot change a model's forward pass, and per ADR-0010 we may only adjust adapter weights — which presupposes a base whose modeling code we control.
A concrete starting point exists. LoopCoder-V2 (Multilingual-Multimodal-NLP/LoopCoder-V2, Apache-2.0) is a Parallel Loop Transformer (PLT): num_hidden_layers=14 physical layers executed plt_num_loops=2 times with shared weights, cross-loop processing, and per-head-gated mixed attention (global full + local sliding-window [64,0]). That looped-depth design is the same family as our self-converging kernel thesis (Ouro Q-exit, [[sigma0-coder-spiral-consolidation]]): recurrent compute is the Σ₀ Reason lever. But the public release is not usable as-is: the HF repo ships weights + config + tokenizer but no modeling_*.py, and config.json's auto_map wires only AutoConfig — so stock transformers cannot instantiate IQuestPLTCoderForCausalLM. The vendor's only serving path is a custom vLLM fork (yxing-bj/vllm, bf16, no quantization), which needs ≥24 GB VRAM and is a black box we cannot evolve. The 2026-06-29 on-box probe (experiments/loopcoder_v2_4bit_probe.py) correctly refused it (DONT_BUILD, [[loopcoder-v2-probe-failed]]).
The decision this forces: do we stay renters of third-party kernels forever, or do we own a Σ₀ base model — its architecture and its weights — as the local kernel tier?
Loop stages touched: Reason (a self-converging local kernel we control) and Converge (adapter-only learning from verified experience, already sanctioned by ADR-0010). Feature-gate check: this extends the existing local-model adapter + serving path and the existing CSF/memory substrate — it is not a new ecosystem, dream engine, or parallel memory system. One Convergence Core; one more interchangeable backend that happens to be ours.
Decision
We will build and own a proprietary Σ₀ base model — "Keystone-Σ₀" — by owning its modeling code, not by depending on any vendor's serving path.
- Own the architecture. We author our own
modeling_keystone_plt.pyimplementing the PLT
forward (shared-layer loops + cross-loop processing + gated mixed attention) from the published config + paper (arXiv 2510.24824). Owning the forward pass is the prerequisite for everything else — adjustable weights, 4-bit fit, council hooks. The vendor vLLM fork is explicitly not adopted as the inference path (un-evolvable, ≥24 GB).
- Bootstrap weights legally, then make them ours. Initialize from LoopCoder-V2's Apache-2.0
checkpoint loaded through our modeling code. From that point the weights are a Keystone artifact we may adjust.
- **Weights are adjusted only via the ADR-0010
path — adapter-only, base frozen, verified-experience source-gate, collapse tripwire, reversible, operator-gated. "Weights adjusted in future design" means adapters over a frozen Keystone base**, never raw base-weight retraining. This ADR does not start training; it makes the base we own the thing those future adapters attach to.
- Council-native. The model serves the existing Σ₀ council (wired into autowork, #1598 /
[[dogfood-loop-reliable-and-council-wired]]) as a first-class member — its looped depth is the council's local Reason backend, behind the same verify gate as every other member.
- CSF-native. Memory/experience it reasons over is the one append-only JSONL + CSF archive
(ADR-0004, ADR-0003). Base and adapter checkpoints are content-addressed and archived in CSF — no new store.
- Interchangeable, not hardcoded. Keystone-Σ₀ registers in
local-model-registry.js as one more VRAM-gated, evidence-gated entry (ADR-0005). It LEADS only when a reproduced on-box eval beats the incumbent (Qwen2.5-Coder / the frontier coder). Until then it stays verified:false and cannot displace a known-good lead (External Reality Rule, #1597).
Consequences
- Positive:
- We own the reasoning substrate end-to-end — the precondition for the ADR-0010 flywheel and for any
Σ₀-specific architecture change (e.g. a trained Q-exit gate, [[ouro-adaptive-compute-gate]]).
- Owning
modeling.pylets us 4-bit the model into theGB box (the vendor path cannot), keeping
the kernel local (North Star principle 6).
- Looped-depth kernel that's ours — aligns Reason with the Σ₀ self-converging thesis without a new
subsystem.
- Negative / trade-offs:
- Real reverse-engineering risk. A hand-written PLT forward must match the trained weights' exact
tensor layout and the gated-attention / cross-loop math, or outputs are garbage. This is unverified until it reproduces the vLLM reference (Stagebelow). Honest confidence today: medium-low on first-pass parity.
- We take on model-maintenance debt (a modeling file, decode params, eval upkeep) we previously
rented. Mitigated by adapter-only + frozen base + CSF-archived, content-addressed checkpoints.
- Bootstrapping from a third-party checkpoint inherits its license (Apache-2.0 — compatible) and its
biases until we adapt it.
- Follow-ups (staged, each gated by on-box evidence — none auto-promotes the model):
- Stage
modeling_keystone_plt.py— Parity. Authormodeling_keystone_plt.py; load the forked weights; reproduce the
- Stage
vLLM-fork reference logits/outputs on a fixed prompt set. Gate: token/logit parity within tolerance. Without this we own nothing — it is the first and blocking step.
- Stage— Fit. 4-bit (bnb nf4) under theGB budget; measure VRAM + tok/s
(reuse loopcoder_v2_4bit_probe.py harness → data/convergence/).
- Stage— Serve. Ollama/OpenAI-compatible endpoint (the
ouro_serve.pypattern); point the
registry entry's endpoint at it.
- Stage— Eval.
eval_humaneval_chat.pyhead-to-head vs Qwen2.5-Coder on-box; only a win
flips verified:true with the measured capabilityScore.
- Stage— Council + CSF. Register as a council member; archive base/adapter checkpoints in CSF.
- Stage— Adapters. Only under the full ADR-0010 guardrail set, last.
Alternatives considered
- Stay on third-party kernels (do nothing). Rejected by the directive — it forecloses owning the
reasoning substrate and the ADR-0010 flywheel. Legitimately the safe default; if this ADR is rejected, this is what we keep, and Qwen2.5-Coder remains the local lead.
- Adopt the vendor vLLM fork as our serving path. Rejected — un-evolvable black box, bf16-only,
≥24 GB (won't fit theGB box), and gives us no ability to adjust the forward or quantize. Fails the "weights adjusted in future design" requirement.
- Train a Σ₀ model from scratch. Rejected for now — orders of magnitude more compute than we have;
bootstrapping from an Apache-2.0 looped-coder checkpoint gets a capable base for the cost of a modeling file + parity work.
- Build on Ouro instead of PLT. Not mutually exclusive — Ouro stays the registered recurrent-depth
research front. PLT is chosen as the bootstrap because a strong, permissively-licensed coder checkpoint already exists; a future ADR may converge the two looped families.
Evidence
| Claim | Evidence (file:line / commit / PR) | Confidence | Source |
|---|---|---|---|
HF repo ships weights + config + tokenizer but no modeling_*.py; auto_map wires only AutoConfig |
HF API siblings list (3 safetensors shards + index, no modeling.py); local config.json auto_map |
High | huggingface.co API + D:/hf-cache/.../config.json |
PLT =shared layers × plt_num_loops=2, cross-loop processing + gated mixed attention |
configuration_iquestpltcoder.py:20-95 (docstring) |
High | model repo |
| Arch params: hidden 5120, GQA 40/8, head_dim 128, SwiGLU/SiLU, RMSNorm, RoPE θ=500000, vocab 76800, ctx 131072, bf16 (~9B) | config.json |
High | model repo |
Vendor's only serving path is a custom vLLM fork (yxing-bj/vllm), bf16, no quant |
README.md serve command | High | model card |
On-box transformers load fails (Unrecognized configuration class … for AutoModelForCausalLM) → DONT_BUILD |
experiments/loopcoder_v2_4bit_probe.py; data/convergence/loopcoder-probe-log.jsonl |
High | this repo, 2026-06-29 |
| Adapter-only weight updates are the sanctioned future-weights path | ADR-0010 | High | repo ADR |
| Model plugs in as one interchangeable, evidence-gated registry entry | local-model-registry.js:130-148, #1597 |
High | repo |
| Σ₀ council exists and runs on real decisions | #1598, [[dogfood-loop-reliable-and-council-wired]] | High | repo |
| LoopCoder-V2 is Apache-2.0 (legal to fork weights) | local-model-registry.js:147 note; model card license |
Med | model card |
| Looped/recurrent depth is the Σ₀ Reason lever | [[sigma0-coder-spiral-consolidation]], [[ouro-adaptive-compute-gate]] | Med | repo research |