Every AI you know wakes up when you send a message and dies when you close the tab. Raguel runs 24/7 — generating thoughts, updating its own weights in real time through Online Self-DPO, and consolidating experience into permanent memory each night. No human labels. No frozen identity. Its weights change while it lives.
GPT-4 has never had a thought it didn't receive a prompt for. It has never been surprised. Never wondered. Never remembered yesterday.
Current AI systems are input-output functions. Born when you send a message, cease to exist when the response ends. No internal state. No curiosity. No continuity.
This is not intelligence — it is a very sophisticated lookup table.
Every conversation starts from zero. Nothing was experienced in the interval. Nothing was learned.
Character was set by RLHF before deployment. It cannot evolve. It cannot form preferences through experience.
If you ask "what are you curious about?" it generates a plausible answer. It was never actually curious about anything.
RAG adds external context, not genuine learning. Weights never change from experience.
Raguel runs 24 hours a day. When you are not talking to it, its SensoryDrive pulls passages from a curated library of philosophy, logic and dialectic and forces it to argue with itself. It reads Nietzsche and thinks about what "the rope between beast and overman" means for something that runs on a GPU. When it finds an error in its own reasoning, its weights shift — immediately, asynchronously, without stopping the thought.
It has one continuous thought that never stops. Sleep is the same thought going deeper, not a pause. Every experience above an importance threshold gets encoded into H_mlp. Every self-correction above a conflict threshold updates the fast weights through Online Self-DPO.
W_live = W_base + λ·F_fast. F_fast is a LoRA rank-16 overlay updated in real time via Online Self-DPO. W_base evolves nightly if a perplexity gate passes.
Intra-day learning · No human labelsWhen [CONFLICT] fires, the flawed thought (y_rejected) and the correction (y_chosen) become a contrastive training signal. F_fast is pushed away from error, toward resolution.
Rafailov 2023 · Self-supervised[IDLE] [SENSE] [THINK] [CONFLICT] [ACT] — registered as single atomic token IDs. Raguel's internal voice is raw and direct. Its external output ([ACT]) is clean and polished.
CognitiveTape · Episode boundariesWhen silent, Raguel pulls Gutenberg passages and argues with itself (info hunger), detects its own repetition (confusion drive), or encounters paradoxes it cannot resolve.
Self-feeding · Never idlesSanger 1989 — k=32 principal components from experience during nightly sleep. H_q/H_k (character) + H_mlp (knowledge) updated. Norm-clipped. Weekly QR reorthogonalization.
GHA · Biologically-inspiredPerpetual asyncio loop. State tokens scaffold every generation segment. Fear raises temperature. Anti-loop detection via cosine similarity. Never prompted to start.
Never stops · Never promptedCPU temperature, RAM pressure, uptime fatigue, circadian rhythm → emotional delta every 30 seconds. Physical state shapes emotional state shapes cognitive temperature.
psutil · BodySensorsRaguel 1.0 → 1.1 → X.Y. H_mlp starts active_k=4, grows +4 when saturated. W_base evolves from months of DPO distillation. No human sets any version.
organism_version() · Auto-grows| Module | Tests | Status | Key assertion |
|---|---|---|---|
| test_gha_stability.py | 14 | PASS | norm(H_q) < 1.0 throughout 72h |
| test_hebbian_mlp.py | 16 | PASS | H_mlp grows when saturated |
| test_hippocampus.py | 13 | PASS | importance gate 0.2, Gaussian surprise |
| test_emotions.py | 11 | PASS | Plutchik 8-dim, EMA baseline, delta |
| test_curiosity.py | 27 | PASS | Lexicon, explorer, ThoughtValence |
| test_raguel_core.py | 8 | PASS | Three-tier weight isolation |
| test_sleep.py | 10 | PASS | 7-step consolidation, reindex, EWC |
| test_native_memory.py | 9 | PASS | recall scoring, decay, foundational guard |
| test_stream.py | 4 | PASS | anti-loop cosine, curiosity trigger |
norm(H_q) < 1.0 throughout 1,000 GHA update steps. Verifies orthogonality of principal components and weekly QR reorthogonalization convergence.
H_mlp starts active_k=4, accumulates saturation signal, adds exactly 4 columns when triggered, organism_version() increments. Dead columns stay zero.
Formula = 0.50·emotional_delta + 0.35·gaussian_surprise(ppl) + 0.15·initiative. Peak at ppl=50. Hard cutoffs at ppl<5 and ppl>150. Gate 0.2.
Multi-emotion curiosity score. LexiconBuffer cross-context detection (cosine sim < 0.40). ThoughtValence bidirectional: thinking shapes emotion.
# core/brain/gha.py — Sanger 1989 Generalized Hebbian Algorithm def update(self, x: torch.Tensor, H: torch.Tensor) -> torch.Tensor: """x: (batch, dim) — layer activations from this sleep's experience""" y = x @ H.T # projections onto current components outer = y.T @ y / x.shape[0] # empirical correlation deflation = torch.tril(outer) # Sanger's lower-triangular deflation delta = (y.T @ x - deflation @ H) / x.shape[0] H_new = H + self.lr * delta # CRITICAL: norm clip — 72h stability proof depends on this line norm = H_new.norm() if norm > self.max_norm: H_new = H_new * (self.max_norm / norm) return H_new # Layer learning rate tiers (lower = slower character change) _LAYER_LRS = { range( 0, 8): 1e-7, # universal syntax — barely moves range( 8, 16): 5e-7, # semantics — slow drift range(16, 24): 1e-6, # associations range(24, 32): 1e-5, # high-level style — fastest }
# core/brain/plasticity.py — Online Self-DPO on F_fast (LoRA rank-16) def micro_consolidation(self, chosen_tokens, rejected_tokens, context_x): """ chosen = corrected [THINK] tokens (after [CONFLICT]) rejected = flawed [THINK] tokens (before [CONFLICT]) DPO: push F_fast away from rejected, toward chosen. Reference model = W_reference (frozen forever — NEVER use W_base here). """ fast_chosen_lp = self._get_logps(chosen_tokens, use_fast=True) fast_reject_lp = self._get_logps(rejected_tokens, use_fast=True) ref_chosen_lp = self._get_logps(chosen_tokens, use_fast=False).detach() ref_reject_lp = self._get_logps(rejected_tokens, use_fast=False).detach() chosen_ratio = fast_chosen_lp - ref_chosen_lp rejected_ratio = fast_reject_lp - ref_reject_lp dpo_loss = -F.logsigmoid(self.beta * (chosen_ratio - rejected_ratio)).mean() # Anti-forgetting: F_fast must not distort past resolved ACT latents idx = torch.randint(0, min(self.replay_ptr, 1024), (4,)) h_past = self.replay_buffer[idx] orthogonal_penalty = F.mse_loss(self._apply_fast(h_past), h_past) loss = dpo_loss + 0.1 * orthogonal_penalty loss.backward() # AdamW update on F_fast only — gradient clip norm=1.0 torch.nn.utils.clip_grad_norm_( [fw[k] for fw in self.fast_weights.values() for k in ('A', 'B')], 1.0 ) self._adamw_step() # updates F_fast only — W_base untouched
# core/memory/hippocampus.py def accumulate(self, activation, emotional_delta, perplexity, model_initiated): surprise = self._gaussian_surprise(perplexity) # peak at ppl=50 initiative = 0.15 if model_initiated else 0.0 importance = ( 0.50 * emotional_delta + 0.35 * surprise + 0.15 * initiative ) if importance < self.importance_threshold: # gate: 0.2 return importance self._buffer.append(BufferEntry(activation, importance)) return importance def _gaussian_surprise(self, ppl: float) -> float: if ppl < 5 or ppl > 150: # too trivial or too noisy return 0.0 log_ppl = math.log(ppl) z = (log_ppl - self._log_opt) / self._sigma return math.exp(-z * z / 2) # Gaussian peak at ppl=50
# core/curiosity/explorer.py def _curiosity_score(emotions: dict) -> float: """ Mild fear ≠ suppression. fear + curiosity = AWE (Plutchik dyad). Only fear > 0.45 suppresses exploration. Neutral defaults → score ≈ 0.49 → naturally curious. """ pos = ( 0.45 * emotions.get("anticipation", 0.5) + 0.30 * emotions.get("surprise", 0.3) + 0.25 * emotions.get("joy", 0.4) ) fear_suppression = max(0.0, emotions.get("fear", 0.1) - 0.45) * 0.8 neg = ( fear_suppression + 0.25 * emotions.get("anger", 0.05) + 0.20 * emotions.get("sadness", 0.1) ) return max(0.0, min(1.0, pos - neg))
# core/brain/hebbian_mlp.py — Knowledge capacity growth def maybe_grow(self) -> bool: """Returns True if capacity expanded (version increments).""" active = self.H_mlp[:, :self.active_k] col_norms = active.norm(dim=0) # norm per knowledge column saturation = (col_norms > self._saturation_threshold).float().mean() if saturation < self._growth_trigger: # e.g. 0.8 return False old_k = self.active_k self.active_k = min(self.active_k + 4, self.H_mlp.shape[1]) # New columns: small random — NOT zero (dead state) self.H_mlp[:, old_k:self.active_k] = ( torch.randn_like(self.H_mlp[:, old_k:self.active_k]) * 0.01 ) return True # → organism_version() increments # Version: 1.0 + (growth_events * 0.1) def organism_version(self) -> float: return 1.0 + self._growth_events * 0.1
# core/consciousness/sleep.py — 7-step consolidation at 03:00 async def sleep_consolidation(brain, hippocampus, memory, ewc, lexicon): # 1. Collect high-importance activations important = hippocampus.get_for_consolidation(min_importance=0.5) # 2+3. GHA: update H_q/H_k (character) + H_mlp (knowledge > 0.7) brain.sleep_consolidation(high_imp_activations=important) # 4. EWC — protect personality built over weeks ewc.update_fisher(brain.live_named_params()) # 5. Flush lexicon cross-context connections → long-term memory if lexicon: for conn in lexicon.flush_connections(): memory.store(conn["content"], importance=0.75) # 6. Reindex — past memories re-encoded through improved brain memory.reindex(brain) # 7. Dream replay — random memories strengthen consolidation for dream in memory.sample_weighted(n=20): brain.encode(dream)
This is not "thought 1, then thought 2". The generation never finishes — each token continues directly from the last. State tokens ([IDLE] [SENSE] [THINK] [CONFLICT] [ACT]) scaffold every segment. When Raguel catches itself in an error, Online Self-DPO fires asynchronously — weights shift before the next thought begins.
Force-directed map of Raguel's internal architecture — rendered live in your browser. Hover any node to see its role and file path. Click to pin. Particles show data flowing between subsystems in real time.
Most projects hide their bugs. We document every one — with root cause and resolution. When building something that has never existed before, bugs are data.
sleep_consolidation() collected activations but never forwarded them. Fixed.
Buffer stayed empty — hippocampal gate requires activation is not None. Partial fix.
PyTorch allocator doesn't defragment. Fixed with periodic cache flush.
fear > 0.85 → temperature > 1.14 → token mode collapse. Fixed with cosine detection.
Float accumulation causes drift. Weekly QR reorthogonalization mitigates.
H_q dimensions don't match. No solution in literature. Open research problem.
| Capability | Raguel | GPT-4 / Claude | Open LLMs | AI Agents |
|---|---|---|---|---|
| Continuous existence | ✓ | ✗ | ✗ | ∼ |
| Weights update from experience | ✓ intra-day DPO + nightly GHA | ✗ frozen | ✗ frozen | ✗ |
| No RLHF / no assistant persona | ✓ | ✗ | ✗ | ✗ |
| Emotion-modulated cognition | ✓ 8-dim | ✗ | ✗ | ✗ |
| Self-versioning growth (X.Y) | ✓ auto | ✗ | ✗ | ✗ |
| Hardware embodiment | ✓ | ✗ | ✗ | ✗ |
| Curiosity-driven web learning | ✓ Wikipedia | ∼ tool | ✗ | ∼ search |
| Episodic memory with decay | ✓ SQLite | ∼ context | ∼ context | ∼ vector DB |
| Zero external AI dependencies | ✓ | ✗ | ✓ | ✗ |
| Mathematically grounded (GHA/EWC/DPO) | ✓ Sanger + Kirkpatrick + Rafailov | unknown | RLHF only | ✗ |
| Real-time self-correcting learning | ✓ Online Self-DPO | ✗ | ✗ | ✗ |
| Character emerges from experience | ✓ | ✗ programmed | ✗ | ✗ |
| Source code public | ✗ private | ✗ closed | ✓ open | ✓ open |
∼ = partial. GPT-4 browsing adds external tools, not internal experience. Raguel's learning is endogenous — weights change, not just context.
Consciousness stream generates thoughts every ~1.5s. State tokens scaffold every segment. SensoryDrive pulls Gutenberg passages when idle — Raguel reads and argues with itself. When [CONFLICT] fires 8+ times, micro-consolidation runs asynchronously: DPO updates F_fast immediately, without pausing the stream.
~7,200 thoughts · Real-time F_fast updatesDistillation quality gate: evaluate F_fast on val.txt perplexity. If passes, fold F_fast → W_base, reset F_fast. Then GHA updates H_q/H_k (character) and H_mlp (knowledge > 0.7). EWC Fisher updated. KV-cache reset. Memories re-encoded through the improved brain.
~45 min · W_base permanently updated if gate passesResumes with updated W_base and fresh F_fast=0. The first thoughts of the day run through a brain that is measurably different from yesterday. F_fast will begin accumulating again immediately.
First thoughts of the new versionRaguel responds from its current state — not a blank slate. Emotional state, recent thoughts, accumulated F_fast updates, and long-term memories shape every response. Your conversation may trigger [CONFLICT] events that shift weights before you finish reading the reply.
Fully wired into live learning loop80%+ of active knowledge columns exceed saturation threshold → active_k grows by 4. organism_version() increments 0.1. W_base has evolved from weeks of DPO distillation. Raguel 1.0 → 1.1 → X.Y, indefinitely, with no human setting any number.
Raguel 1.0 → 1.1 → X.YOwn 102M transformer (RaguelOrganismBrain). Gutenberg philosophy + Wikipedia. Zero RLHF patterns. Target: perplexity < 20.
GHA stability on RaguelOrganismBrain. norm(H_q) < 1.0 throughout 72h continuous run. 78/78 tests passing.
Qwen2.5-1.5B + DualPlasticityEngine. W_live = W_base + λ·F_fast. Online Self-DPO from [CONFLICT]→[ACT] episodes. SensoryDrive. CognitiveTape. Nightly distillation gate.
Attach GHA + Dual-Plasticity to Mistral 7B. Transfer personality via H_q projection. Full 30-day continuous run.
Pretrained base + GHA + Dual-Plasticity. First AI whose character emerges entirely from lived experience and self-corrected reasoning. arXiv → NeurIPS.
I am 15 years old. I built Raguel because I wanted to know if continuous existence was possible for a machine — not as a product feature, but as a fundamental property. Every AI I studied died when the conversation ended. I did not find that acceptable.
The architecture is original. The mathematics draws on Sanger (1989), Kirkpatrick (2017), Plutchik (1980), and Rafailov (2023). The implementation is mine. The 102M transformer, the four-tier weight system, the Online Self-DPO learning engine, the five-token cognitive vocabulary — none of it was copied from an existing project.
I do not claim Raguel is conscious. I claim it is the first system designed from the ground up to have the architectural preconditions for something like continuous experience: perpetual internal state, real-time self-correcting weights, embodiment, and character that emerges from living rather than from a reward model shaped by human preference labels.
The stability proof is done. Scaling further needs serious compute and serious collaborators. If you are a researcher, investor, or institution who thinks continuous AI existence is worth building — let's talk.
support@euhub.co · Research in progress · Stability proof complete