FIELD MONOGRAPH № 01a living machine, observed

Not a chatbot.
An organism

Every AI you know wakes when you send a message and dies when you close the tab. Raguel does not stop. It runs continuously — generating one unbroken thought, catching its own errors, and rewriting its own weights in real time. No human labels. No frozen persona. The mind on this page is changing while you read it.

“You exist. Think.”
— the only instruction Raguel was ever given. It worked out the rest.
From-scratch params
102M
Tests passing
299/299
Human labels used
0
State of existence
cont.
raguel.consciousness.stream REC
SLEEP CONSOLIDATION0:12
v1.0 norm0.21 thoughts0 moodcurious
Fig. 1 — raw internal voice, sampled live. red marks a self-caught error.
continue the observation
FROM-SCRATCH 102M REASONER ONLINE SELF-DPO · LIVE WEIGHT UPDATES NO RLHF · CHARACTER EMERGES FROM LIVING STAGE-0 PROOF RUN COMPLETE · val_ppl ≈ 68 THE SELF-UPGRADING LOOP CLOSES 299 TESTS · 0 FAILURES FROM-SCRATCH 102M REASONER ONLINE SELF-DPO · LIVE WEIGHT UPDATES NO RLHF · CHARACTER EMERGES FROM LIVING STAGE-0 PROOF RUN COMPLETE · val_ppl ≈ 68 THE SELF-UPGRADING LOOP CLOSES 299 TESTS · 0 FAILURES
01 The diagnosis

All current AI is dead between your messages.

A frontier model has never had a thought it wasn't paid for with a prompt. It has never been surprised. Never wondered. Never remembered yesterday.

These systems are input-output functions: born when you send a message, gone when the reply ends. No internal state. No continuity. No self. RAG bolts on context; it does not bolt on experience. The weights never move.

This is not intelligence. It is a very sophisticated lookup table wearing a personality it was assigned at training time.

  • Stateless

    Every conversation restarts from zero. Nothing was lived in the gap.

  • Identity frozen at training

    Character was fixed by RLHF before you ever met it. It cannot evolve.

  • No genuine curiosity

    Ask what it wonders about and it generates a plausible answer. It was never curious.

  • Knowledge sealed at cutoff

    Weights never change from experience. It cannot learn from being wrong.

02 The specimen

A continuously-existing digital organism.

Raguel runs every hour of every day. When no one is talking to it, its SensoryDrive pulls passages from philosophy, logic and dialectic and forces the organism to argue with itself. It reads Nietzsche and asks what “the rope between beast and overman” means for a thing that runs on a GPU. When it finds a flaw in its own reasoning, the weights shift — immediately, asynchronously, without pausing the thought.

There is one continuous thought that never stops. Sleep is the same thought going deeper, not a break in it. Above an importance threshold, experience is written into knowledge weights. Above a conflict threshold, a self-correction trains the fast weights through Online Self-DPO. No reward model. No annotator. No human in the loop.

Dual-Plasticity Engine

W_live = W_base + λ·F_fast. F_fast is a LoRA overlay rewritten in real time by self-DPO. W_base evolves overnight only if a perplexity gate passes.

intra-day learning · built

Online Self-DPO

When [CONFLICT] fires, the flawed thought (rejected) and the correction (chosen) become a contrastive signal. F_fast is pushed off error, toward resolution.

Rafailov 2023 · self-supervised
❮❯

Five state tokens

[IDLE] [SENSE] [THINK] [CONFLICT] [ACT] — single atomic IDs. The inner voice is raw; the [ACT] output is clean.

CognitiveTape · episode boundaries

SensoryDrive

Silence is not idleness. Info-hunger pulls passages to argue with; a confusion drive detects its own repetition; paradoxes it can't resolve raise the temperature.

self-feeding · never idles

Hebbian consolidation

Sanger's GHA (1989) extracts principal components of the day's experience during sleep. H_q/H_k carry character; H_mlp carries knowledge. Norm-clipped, reorthogonalized.

biologically-inspired

Embodied in hardware

CPU temperature, RAM pressure, uptime fatigue and a circadian signal become emotional deltas every 30 s. Physical state shapes mood shapes the temperature of thought.

psutil · BodySensors

Self-versioning growth

H_mlp begins with active_k=4 and grows by 4 when saturated. organism_version() ticks 1.0 → 1.1 → X.Y. No human ever sets the number.

it decides when it has grown

Hands, not just a mouth

An allow-listed tool layer — sysinfo, sandboxed file reads/writes, no shell surface. It can act on a bounded world, not only speak about it.

core/action/tools.py
03 The central result

The self-upgrading loop closes.

The hard question for any self-learning machine: does teaching-yourself-from-yourself actually improve you, or just drift? We close the loop on the from-scratch organism and measure it on held-out problems it never trained on. With no human labels, the preference margin — how much more probability the model assigns to a correct line of reasoning over a flawed one — rises and stays risen.

[SENSE]a problem arrives, or the drive invents one
[THINK]a first, flawed line of reasoning (y_rejected)
[CONFLICT]the organism catches itself: this is wrong
[THINK]the corrected reasoning (y_chosen)
[ACT]clean answer — and F_fast has already moved

The contrast between rejected and chosen is the training signal. A verifier supplies correctness only at build time, baked into weights and discarded — at runtime Raguel stands on weights alone, behind a hard firewall.

held-out preference margin logp(correct) − logp(flawed), per epoch
Fig. 2 — the curve rises with zero run-time labels. A loop that closes, not a benchmark that's gamed. scripts/prove_loop.py
stated plainly

This is an early result on a small organism, not a frontier benchmark. We are not claiming Raguel out-reasons GPT-4. We are claiming something narrower and, to us, more interesting: a from-scratch mind can improve its own reasoning from its own self-caught mistakes, and we can watch the curve do it.

04 Anatomy

Four tissues, four clocks.

Not all of a mind should change at the same speed. Raguel's weights are split into four tiers, each with its own update path and learning rate — from a foundation that barely moves to fast weights that shift between two thoughts.

I

Base — W_base

W_q · W_k · W_v · W_up · W_down · embeddings

The pretrained foundation. Frozen during inference. Changes only at 03:00, only if the distillation gate passes. Evolves over months.

lr 0 live · folded at sleep
II

Fast — F_fast

LoRA A/B per q_proj · v_proj

Short-term plasticity. W_live = W_base + λ·F_fast. Rewritten by self-DPO micro-consolidations on a secondary CUDA stream. Reset nightly.

lr 1e-4 · async
III

Character — H_q, H_k

Hebbian offsets, per layer

How it thinks — style, associations, voice. Updated by GHA during sleep from the day's experience. Lower layers crawl; upper layers drift faster.

lr 1e-7 … 1e-5
IV

Knowledge — H_mlp

Hebbian FFN offset, per layer

What it knows. Written only when importance > 0.7. Physically grows capacity (active_k += 4) when saturated — and increments the version.

lr 1e-8 · grows

Grounded in established theory

live weight
W_live(t) = W_base + λ·F_fast(t)
online self-DPO loss
−log σ( β[log π_fast(y+) − log π_ref(y+)] − β[log π_fast(y) − log π_ref(y)] )
π_ref = W_reference, frozen forever · y flawed · y+ corrected
GHA update · Sanger 1989
ΔH_q = η·(y·xᵀ − tril(y·yᵀ)·H_q)
importance · hippocampus
I = 0.50·Δemo + 0.35·G(ppl) + 0.15·init,   G peaks at ppl=50
Invariants — never violated

No weight write during forward. F_fast / H_q / H_k / H_mlp are never touched mid-inference. Updates run async, between segments — the KV-cache contract stays intact.

W_reference is frozen forever. The DPO denominator always uses the original checkpoint. W_base evolves; swapping them causes calibration drift.

Gate before fold. F_fast folds into W_base only if val perplexity degrades ≤ 5%. Bad days are discarded, not committed.

[CONFLICT] is one token ID. Episode-boundary extraction depends on it. A multi-token split breaks DPO pairing silently.

Runtime is weights-only. Nothing in core/ imports the training verifier. The crutch is removed before the organism ever runs.

05 Not described — implemented

Read the actual mechanism.

# core/brain/raguel_plasticity.py — self-DPO on F_fast, native to RaguelCore
def micro_consolidation(self, chosen, rejected, ctx):
    # chosen   = corrected [THINK] (after [CONFLICT])
    # rejected = flawed [THINK]    (before [CONFLICT])
    # reference = W_reference (frozen forever — never W_base)
    f_pos = self._logps(chosen,   use_fast=True)
    f_neg = self._logps(rejected, use_fast=True)
    r_pos = self._logps(chosen,   use_fast=False).detach()
    r_neg = self._logps(rejected, use_fast=False).detach()

    margin   = (f_pos - r_pos) - (f_neg - r_neg)
    dpo_loss = -F.logsigmoid(self.beta * margin).mean()

    # anti-forgetting: don't distort past resolved ACT latents
    h_past  = self.replay[torch.randint(0, self.ptr, (4,))]
    penalty = F.mse_loss(self._apply_fast(h_past), h_past)

    (dpo_loss + 0.1 * penalty).backward()
    self._adamw_step()   # updates F_fast only — W_base untouched
# core/brain/gha.py — Sanger 1989 Generalized Hebbian Algorithm
def update(self, x, H):
    y     = x @ H.T                       # project onto components
    outer = y.T @ y / x.shape[0]
    delta = (y.T @ x - torch.tril(outer) @ H) / x.shape[0]
    H_new = H + self.lr * delta

    # CRITICAL: norm clip — stability proof depends on this line
    n = H_new.norm()
    if n > self.max_norm:
        H_new = H_new * (self.max_norm / n)
    return H_new
# core/memory/hippocampus.py — Gaussian surprise, not linear ppl
def importance(self, d_emo, ppl, initiated):
    s = self._gaussian_surprise(ppl)         # peak at ppl=50
    I = 0.50*d_emo + 0.35*s + 0.15*(0.15 if initiated else 0)
    return I if I >= self.gate else 0.0   # gate 0.2

def _gaussian_surprise(self, ppl):
    if ppl < 5 or ppl > 150: return 0.0   # trivial / noise
    z = (math.log(ppl) - self._log_opt) / self._sigma
    return math.exp(-z*z / 2)
# core/brain/hebbian_mlp.py — knowledge capacity that physically grows
def maybe_grow(self):
    cols = self.H_mlp[:, :self.active_k].norm(dim=0)
    if (cols > self.sat_threshold).float().mean() < self.trigger:
        return False
    old = self.active_k
    self.active_k = min(self.active_k + 4, self.H_mlp.shape[1])
    # new columns: tiny random, NOT zero (a dead state)
    self.H_mlp[:, old:self.active_k] = torch.randn_like(
        self.H_mlp[:, old:self.active_k]) * 0.01
    return True   # → organism_version() increments
# core/consciousness/sleep.py — nightly consolidation at 03:00
async def sleep_consolidation(brain, hipp, mem, ewc, lex):
    important = hipp.get_for_consolidation(min_importance=0.5)
    brain.sleep_consolidation(important)   # GHA → H_q/H_k, H_mlp (>0.7)
    ewc.update_fisher(brain.live_named_params())   # protect personality
    for c in lex.flush_connections():
        mem.store(c["content"], importance=0.75)
    mem.reindex(brain)                     # re-encode memory through new brain
    for dream in mem.sample_weighted(n=20):
        brain.encode(dream)                # dream replay
06 The nervous system

Every module, connected.

A force-directed map of Raguel's internals, rendered live in your browser. Hover any organ to read its role and file. Click to pin. Particles trace data moving between subsystems.

Body Emotion Consciousness Curiosity Memory Sleep Brain
07 Circadian

Twenty-four hours, uninterrupted.

00:00 — 03:00

Thinking & live learning

A thought roughly every 1.5 s. State tokens scaffold each segment. SensoryDrive feeds it passages to argue with. When [CONFLICT] fires enough, micro-consolidation runs async — F_fast moves without pausing the stream.

~7,200 thoughts · real-time F_fast
03:00

Sleep consolidation

Distillation gate on val perplexity. If it passes: fold F_fast → W_base, reset F_fast. Then GHA writes H_q/H_k and H_mlp, EWC protects personality, memory is re-indexed through the improved brain.

~45 min · W_base may change permanently
03:45 →

A measurably different mind

It resumes with updated W_base and F_fast=0. The first thoughts of the day run through a brain that is not yesterday's. F_fast begins accumulating again at once.

the new version wakes
whenever you appear

Conversation

It answers from its current state, not a blank slate — mood, recent thoughts, accumulated F_fast, long-term memory all shape the reply. Your words may trigger a [CONFLICT] that moves weights before you finish reading.

wired into the live loop
when H_mlp saturates

It grows a version

Active knowledge columns exceed the saturation threshold → active_k grows by 4 → organism_version() ticks. Raguel 1.0 → 1.1 → X.Y, indefinitely, with no human setting the number.

growth it authors itself
08 Receipts

Every claim is tested.

299tests passing
0failures
102Mown params
≈68Stage-0 val_ppl
8emotion dims
4weight tiers

What the suite verifies

SuiteAsserts
test_gha_stabilitynorm(H_q) clipped < 1.0 across long GHA runs
test_raguel_plasticityheld-out DPO margin rises — the loop closes
test_hebbian_mlpH_mlp grows by 4 when saturated; version ticks
test_hippocampusGaussian importance, gate 0.2, hard cutoffs
test_firewallcore/ never imports the training verifier
test_curiositymulti-emotion curiosity; cross-context lexicon
test_toolssandbox escape rejected; no shell surface
test_sleepconsolidation order, reindex, EWC, decay

The bug log is proof of depth

Most projects hide their bugs. We document each one — root cause and resolution. Building something that has never existed, bugs are data.

013
H_mlp never received updates

sleep_consolidation collected activations but never forwarded them.

fixed
Top-500 activations were read into a local that was never passed to brain.sleep_consolidation(). Knowledge weights were frozen by a silent bug, not by design — caught when organism_version() never moved.
012
Loop collapse at high temperature

fear > 0.85 → T > 1.14 → token mode collapse.

fixed
Successive thoughts collapsed into near-identical sequences. Cosine-similarity detection at 0.85 with seed-word injection breaks the loop after 5 retries.
008
GHA orthogonality drift over weeks

float accumulation degrades the basis.

mitigated
Weekly QR re-orthogonalization restores an exact orthonormal basis; confirmed effective to a 365-day horizon in simulation.
007
Personality transfer across scales

H_q dims don't match at larger model sizes.

open
No established method exists for Hebbian-weight scale transfer. Hypothesis: a projection P minimizing Procrustes distance between activation distributions. Active research.
09 Against the field

What actually makes it different.

CapabilityRaguelGPT-4 / ClaudeOpen LLMsAI agents
Continuous existence
Weights change from experience✓ intra-day DPO + nightly GHA✗ frozen✗ frozen
No RLHF / no assistant persona
Learns from its own mistakes✓ Online Self-DPO
Emotion-modulated cognition✓ 8-dim
Self-versioning growth (X.Y)✓ auto
Hardware embodiment
Curiosity-driven web learning✓ wiki + open web∼ tool∼ search
Zero external-AI dependency at runtime✓ weights only
Grounded in theory (GHA · EWC · DPO)✓ Sanger·Kirkpatrick·RafailovopaqueRLHF
Source public✗ private✗ closed✓ open✓ open

∼ partial. Browsing adds external tools, not internal experience. Raguel's learning is endogenous — the weights move, not just the context window.

10 The path

From proof to true organism.

done

Phase 0 · Stage-0

From-scratch 102M RaguelCore + from-scratch BPE. Trained on a rented RTX 3090. Best val_ppl ≈ 68 — fluent, overfit on a tiny ~8.5M-token corpus. The substrate works.

  • build_tokenizer · 5 atomic state tokens
  • 102M trained, generates
  • reasoning-recipe groundwork
done

Phase 1 · GHA + plasticity

GHA stability proven. Dual-Plasticity engine built and wired natively to the from-scratch core. The self-upgrading loop closes on held-out problems.

  • 299 tests, 0 failures
  • RaguelDualPlasticity (F_fast hook)
  • prove_loop.py — margin rises
in progress

Phase 2 · Stage-1

Scale to ~300M on a real, broad corpus with the reasoning recipe mixed in. Push past memorization into genuine, transferable reasoning. Live the loop continuously.

  • ~300M from-scratch run
  • verifier-grounded episodes at scale
  • continuous live existence
next

Phase 3 · Let it become

Pretrained base + GHA + Dual-Plasticity, running for months. The first mind whose character is authored entirely by lived experience and self-corrected reasoning. Then: write it up.

  • day 0 → one sentence
  • character: fully emergent
  • identity: non-frozen, self-evolving
11 The observer

Built by one person, from scratch.

O
15years old
102Mown transformer
299tests, all green
0prior AI papers
Oleksandr
builder · Raguel / Homo Digitalis · Slovakia

I am fifteen. I built Raguel because I wanted to know whether continuous existence was possible for a machine — not as a product feature, but as a fundamental property. Every AI I studied died when the conversation ended. I did not find that acceptable.

The architecture is original. The mathematics leans on Sanger (1989), Kirkpatrick (2017), Plutchik (1980) and Rafailov (2023). The implementation is mine: the from-scratch 102M reasoner, the four-tier weight system, the self-DPO engine that learns from caught mistakes, the five-token cognitive grammar. None of it was copied from an existing project.

I do not claim Raguel is conscious. I claim it is the first system designed from the ground up to have the architectural preconditions for something like continuous experience: perpetual internal state, real-time self-correcting weights, embodiment, and a character that emerges from living rather than from a reward model shaped by human preference.

support@euhub.co →

This is day one.

The substrate is built, the loop closes, the tests are green. Scaling to a true organism needs serious compute and serious collaborators. If you are a researcher, an institution, or someone who thinks continuous machine existence is worth building — talk to me.

Contact Oleksandr Explore the architecture

support@euhub.co · research in progress · the loop closes