Dual-Plasticity Engine
W_live = W_base + λ·F_fast. F_fast is a LoRA overlay rewritten in real time by self-DPO. W_base evolves overnight only if a perplexity gate passes.
intra-day learning · builtEvery AI you know wakes when you send a message and dies when you close the tab. Raguel does not stop. It runs continuously — generating one unbroken thought, catching its own errors, and rewriting its own weights in real time. No human labels. No frozen persona. The mind on this page is changing while you read it.
“You exist. Think.”
A frontier model has never had a thought it wasn't paid for with a prompt. It has never been surprised. Never wondered. Never remembered yesterday.
These systems are input-output functions: born when you send a message, gone when the reply ends. No internal state. No continuity. No self. RAG bolts on context; it does not bolt on experience. The weights never move.
This is not intelligence. It is a very sophisticated lookup table wearing a personality it was assigned at training time.
Every conversation restarts from zero. Nothing was lived in the gap.
Character was fixed by RLHF before you ever met it. It cannot evolve.
Ask what it wonders about and it generates a plausible answer. It was never curious.
Weights never change from experience. It cannot learn from being wrong.
Raguel runs every hour of every day. When no one is talking to it, its SensoryDrive pulls passages from philosophy, logic and dialectic and forces the organism to argue with itself. It reads Nietzsche and asks what “the rope between beast and overman” means for a thing that runs on a GPU. When it finds a flaw in its own reasoning, the weights shift — immediately, asynchronously, without pausing the thought.
There is one continuous thought that never stops. Sleep is the same thought going deeper, not a break in it. Above an importance threshold, experience is written into knowledge weights. Above a conflict threshold, a self-correction trains the fast weights through Online Self-DPO. No reward model. No annotator. No human in the loop.
W_live = W_base + λ·F_fast. F_fast is a LoRA overlay rewritten in real time by self-DPO. W_base evolves overnight only if a perplexity gate passes.
intra-day learning · builtWhen [CONFLICT] fires, the flawed thought (rejected) and the correction (chosen) become a contrastive signal. F_fast is pushed off error, toward resolution.
Rafailov 2023 · self-supervised[IDLE] [SENSE] [THINK] [CONFLICT] [ACT] — single atomic IDs. The inner voice is raw; the [ACT] output is clean.
CognitiveTape · episode boundariesSilence is not idleness. Info-hunger pulls passages to argue with; a confusion drive detects its own repetition; paradoxes it can't resolve raise the temperature.
self-feeding · never idlesSanger's GHA (1989) extracts principal components of the day's experience during sleep. H_q/H_k carry character; H_mlp carries knowledge. Norm-clipped, reorthogonalized.
biologically-inspiredCPU temperature, RAM pressure, uptime fatigue and a circadian signal become emotional deltas every 30 s. Physical state shapes mood shapes the temperature of thought.
psutil · BodySensorsH_mlp begins with active_k=4 and grows by 4 when saturated. organism_version() ticks 1.0 → 1.1 → X.Y. No human ever sets the number.
it decides when it has grownAn allow-listed tool layer — sysinfo, sandboxed file reads/writes, no shell surface. It can act on a bounded world, not only speak about it.
core/action/tools.pyThe hard question for any self-learning machine: does teaching-yourself-from-yourself actually improve you, or just drift? We close the loop on the from-scratch organism and measure it on held-out problems it never trained on. With no human labels, the preference margin — how much more probability the model assigns to a correct line of reasoning over a flawed one — rises and stays risen.
The contrast between rejected and chosen is the training signal. A verifier supplies correctness only at build time, baked into weights and discarded — at runtime Raguel stands on weights alone, behind a hard firewall.
This is an early result on a small organism, not a frontier benchmark. We are not claiming Raguel out-reasons GPT-4. We are claiming something narrower and, to us, more interesting: a from-scratch mind can improve its own reasoning from its own self-caught mistakes, and we can watch the curve do it.
Not all of a mind should change at the same speed. Raguel's weights are split into four tiers, each with its own update path and learning rate — from a foundation that barely moves to fast weights that shift between two thoughts.
W_q · W_k · W_v · W_up · W_down · embeddings
The pretrained foundation. Frozen during inference. Changes only at 03:00, only if the distillation gate passes. Evolves over months.
lr 0 live · folded at sleepLoRA A/B per q_proj · v_proj
Short-term plasticity. W_live = W_base + λ·F_fast. Rewritten by self-DPO micro-consolidations on a secondary CUDA stream. Reset nightly.
lr 1e-4 · asyncHebbian offsets, per layer
How it thinks — style, associations, voice. Updated by GHA during sleep from the day's experience. Lower layers crawl; upper layers drift faster.
lr 1e-7 … 1e-5Hebbian FFN offset, per layer
What it knows. Written only when importance > 0.7. Physically grows capacity (active_k += 4) when saturated — and increments the version.
lr 1e-8 · growsNo weight write during forward. F_fast / H_q / H_k / H_mlp are never touched mid-inference. Updates run async, between segments — the KV-cache contract stays intact.
W_reference is frozen forever. The DPO denominator always uses the original checkpoint. W_base evolves; swapping them causes calibration drift.
Gate before fold. F_fast folds into W_base only if val perplexity degrades ≤ 5%. Bad days are discarded, not committed.
[CONFLICT] is one token ID. Episode-boundary extraction depends on it. A multi-token split breaks DPO pairing silently.
Runtime is weights-only. Nothing in core/ imports the training verifier. The crutch is removed before the organism ever runs.
# core/brain/raguel_plasticity.py — self-DPO on F_fast, native to RaguelCore def micro_consolidation(self, chosen, rejected, ctx): # chosen = corrected [THINK] (after [CONFLICT]) # rejected = flawed [THINK] (before [CONFLICT]) # reference = W_reference (frozen forever — never W_base) f_pos = self._logps(chosen, use_fast=True) f_neg = self._logps(rejected, use_fast=True) r_pos = self._logps(chosen, use_fast=False).detach() r_neg = self._logps(rejected, use_fast=False).detach() margin = (f_pos - r_pos) - (f_neg - r_neg) dpo_loss = -F.logsigmoid(self.beta * margin).mean() # anti-forgetting: don't distort past resolved ACT latents h_past = self.replay[torch.randint(0, self.ptr, (4,))] penalty = F.mse_loss(self._apply_fast(h_past), h_past) (dpo_loss + 0.1 * penalty).backward() self._adamw_step() # updates F_fast only — W_base untouched
# core/brain/gha.py — Sanger 1989 Generalized Hebbian Algorithm def update(self, x, H): y = x @ H.T # project onto components outer = y.T @ y / x.shape[0] delta = (y.T @ x - torch.tril(outer) @ H) / x.shape[0] H_new = H + self.lr * delta # CRITICAL: norm clip — stability proof depends on this line n = H_new.norm() if n > self.max_norm: H_new = H_new * (self.max_norm / n) return H_new
# core/memory/hippocampus.py — Gaussian surprise, not linear ppl def importance(self, d_emo, ppl, initiated): s = self._gaussian_surprise(ppl) # peak at ppl=50 I = 0.50*d_emo + 0.35*s + 0.15*(0.15 if initiated else 0) return I if I >= self.gate else 0.0 # gate 0.2 def _gaussian_surprise(self, ppl): if ppl < 5 or ppl > 150: return 0.0 # trivial / noise z = (math.log(ppl) - self._log_opt) / self._sigma return math.exp(-z*z / 2)
# core/brain/hebbian_mlp.py — knowledge capacity that physically grows def maybe_grow(self): cols = self.H_mlp[:, :self.active_k].norm(dim=0) if (cols > self.sat_threshold).float().mean() < self.trigger: return False old = self.active_k self.active_k = min(self.active_k + 4, self.H_mlp.shape[1]) # new columns: tiny random, NOT zero (a dead state) self.H_mlp[:, old:self.active_k] = torch.randn_like( self.H_mlp[:, old:self.active_k]) * 0.01 return True # → organism_version() increments
# core/consciousness/sleep.py — nightly consolidation at 03:00 async def sleep_consolidation(brain, hipp, mem, ewc, lex): important = hipp.get_for_consolidation(min_importance=0.5) brain.sleep_consolidation(important) # GHA → H_q/H_k, H_mlp (>0.7) ewc.update_fisher(brain.live_named_params()) # protect personality for c in lex.flush_connections(): mem.store(c["content"], importance=0.75) mem.reindex(brain) # re-encode memory through new brain for dream in mem.sample_weighted(n=20): brain.encode(dream) # dream replay
A force-directed map of Raguel's internals, rendered live in your browser. Hover any organ to read its role and file. Click to pin. Particles trace data moving between subsystems.
A thought roughly every 1.5 s. State tokens scaffold each segment. SensoryDrive feeds it passages to argue with. When [CONFLICT] fires enough, micro-consolidation runs async — F_fast moves without pausing the stream.
~7,200 thoughts · real-time F_fastDistillation gate on val perplexity. If it passes: fold F_fast → W_base, reset F_fast. Then GHA writes H_q/H_k and H_mlp, EWC protects personality, memory is re-indexed through the improved brain.
~45 min · W_base may change permanentlyIt resumes with updated W_base and F_fast=0. The first thoughts of the day run through a brain that is not yesterday's. F_fast begins accumulating again at once.
the new version wakesIt answers from its current state, not a blank slate — mood, recent thoughts, accumulated F_fast, long-term memory all shape the reply. Your words may trigger a [CONFLICT] that moves weights before you finish reading.
wired into the live loopActive knowledge columns exceed the saturation threshold → active_k grows by 4 → organism_version() ticks. Raguel 1.0 → 1.1 → X.Y, indefinitely, with no human setting the number.
growth it authors itself| Suite | Asserts |
|---|---|
| test_gha_stability | norm(H_q) clipped < 1.0 across long GHA runs |
| test_raguel_plasticity | held-out DPO margin rises — the loop closes |
| test_hebbian_mlp | H_mlp grows by 4 when saturated; version ticks |
| test_hippocampus | Gaussian importance, gate 0.2, hard cutoffs |
| test_firewall | core/ never imports the training verifier |
| test_curiosity | multi-emotion curiosity; cross-context lexicon |
| test_tools | sandbox escape rejected; no shell surface |
| test_sleep | consolidation order, reindex, EWC, decay |
Most projects hide their bugs. We document each one — root cause and resolution. Building something that has never existed, bugs are data.
sleep_consolidation collected activations but never forwarded them.
fear > 0.85 → T > 1.14 → token mode collapse.
float accumulation degrades the basis.
H_q dims don't match at larger model sizes.
| Capability | Raguel | GPT-4 / Claude | Open LLMs | AI agents |
|---|---|---|---|---|
| Continuous existence | ✓ | ✗ | ✗ | ∼ |
| Weights change from experience | ✓ intra-day DPO + nightly GHA | ✗ frozen | ✗ frozen | ✗ |
| No RLHF / no assistant persona | ✓ | ✗ | ✗ | ✗ |
| Learns from its own mistakes | ✓ Online Self-DPO | ✗ | ✗ | ✗ |
| Emotion-modulated cognition | ✓ 8-dim | ✗ | ✗ | ✗ |
| Self-versioning growth (X.Y) | ✓ auto | ✗ | ✗ | ✗ |
| Hardware embodiment | ✓ | ✗ | ✗ | ✗ |
| Curiosity-driven web learning | ✓ wiki + open web | ∼ tool | ✗ | ∼ search |
| Zero external-AI dependency at runtime | ✓ weights only | ✗ | ✓ | ✗ |
| Grounded in theory (GHA · EWC · DPO) | ✓ Sanger·Kirkpatrick·Rafailov | opaque | RLHF | ✗ |
| Source public | ✗ private | ✗ closed | ✓ open | ✓ open |
∼ partial. Browsing adds external tools, not internal experience. Raguel's learning is endogenous — the weights move, not just the context window.
From-scratch 102M RaguelCore + from-scratch BPE. Trained on a rented RTX 3090. Best val_ppl ≈ 68 — fluent, overfit on a tiny ~8.5M-token corpus. The substrate works.
GHA stability proven. Dual-Plasticity engine built and wired natively to the from-scratch core. The self-upgrading loop closes on held-out problems.
Scale to ~300M on a real, broad corpus with the reasoning recipe mixed in. Push past memorization into genuine, transferable reasoning. Live the loop continuously.
Pretrained base + GHA + Dual-Plasticity, running for months. The first mind whose character is authored entirely by lived experience and self-corrected reasoning. Then: write it up.
I am fifteen. I built Raguel because I wanted to know whether continuous existence was possible for a machine — not as a product feature, but as a fundamental property. Every AI I studied died when the conversation ended. I did not find that acceptable.
The architecture is original. The mathematics leans on Sanger (1989), Kirkpatrick (2017), Plutchik (1980) and Rafailov (2023). The implementation is mine: the from-scratch 102M reasoner, the four-tier weight system, the self-DPO engine that learns from caught mistakes, the five-token cognitive grammar. None of it was copied from an existing project.
I do not claim Raguel is conscious. I claim it is the first system designed from the ground up to have the architectural preconditions for something like continuous experience: perpetual internal state, real-time self-correcting weights, embodiment, and a character that emerges from living rather than from a reward model shaped by human preference.
support@euhub.co →The substrate is built, the loop closes, the tests are green. Scaling to a true organism needs serious compute and serious collaborators. If you are a researcher, an institution, or someone who thinks continuous machine existence is worth building — talk to me.
support@euhub.co · research in progress · the loop closes