# Abbotik LLM

`abbotik/llm` is the live room runtime for long-running LLM work in the
Abbotik stack.

This repository now keeps only two top-level design documents:

- `README.md` describes the implementation that exists today
- `ARCHITECTURE.md` describes the intended target design and the planned v1/v2
  split

If those two documents ever disagree, `README.md` is the source of truth for
what the repository currently does.

## Current Status

The current service is a TypeScript/Bun/Hono runtime with:

- public `/`, `/llms.txt`, `/health`, and `/docs*` routes
- protected `/api/rooms*` routes plus simple provider and skill discovery
- a per-room worker model
- one queued message in flight per room
- SSE room event replay and follow
- explicit wake, interrupt, and release verbs
- a configurable room store:
  - in-memory by default
  - API-backed through `abbotik/api` data routes when
    `ABBOTIK_ROOM_STORE=api`
- bearer-token hydration through `GET /api/user/introspect` in `abbotik/api`
- machine-auth-backed `llm -> api` token exchange for API-backed room storage

The current implementation is suitable for:

- HTTP surface development
- room lifecycle development
- event and history shape development
- fixture-backed tests
- auth-boundary work against the expected `api` contract

It is not yet a production runtime for durable rooms.

## Product Boundary

The intended split remains:

- `api.abbotik.com` owns durable truth and auth authority
- `llm.abbotik.com` owns live room execution

In Railway production today, that boundary is wired as:

- public ingress at `https://llm.abbotik.com`
- current `llm -> api` traffic through `https://api.abbotik.com`
- `/health` configured as the Railway health check path

On April 15, 2026, a live smoke showed `http://api.railway.internal` was not
reachable from the deployed `llm` service, so production was switched back to
the public `api` URL. Given the room workload, that extra hop is operationally
acceptable for now.

Today that split is only partially implemented:

- the room HTTP surface exists here
- the per-room worker model exists here
- durable storage can now run against either:
  - local in-memory state for tests and isolated development
  - `api` durable models through the generic `/api/data/*` surface
- inbound auth is already modeled as an `api`-owned concern via bearer-token
  introspection

## Verified Durable Plane

Verified against the live durable plane on April 14-15, 2026:

- `api.abbotik.com` already exposes `rooms`
- `api.abbotik.com` already exposes `room_messages`
- `api.abbotik.com` already exposes `room_events`
- those models already contain linked live data

That matters because the durable nouns are not hypothetical. This repo now has a
first-pass store client for those existing durable models, so the remaining work
is about hardening the runtime around that contract rather than inventing a new
durable vocabulary from scratch.

Useful operator commands during this phase:

```bash
abbot describe list
abbot describe get rooms
abbot describe get room_messages
abbot describe get room_events
abbot describe fields list rooms
abbot describe fields list room_messages
abbot describe fields list room_events
abbot data --limit 10 list rooms
abbot data --limit 10 list room_messages
abbot data --limit 10 list room_events
```

## Repository Layout

The implementation centers on a small set of modules:

- [src/server.ts](./src/server.ts)
  - wires the Hono app
  - registers public and protected routes
  - injects the room registry and auth client
- [src/runtime/room-registry.ts](./src/runtime/room-registry.ts)
  - owns the public room verbs
  - enforces tenant-scoped room visibility
  - creates and tracks active room workers
- [src/runtime/room-worker.ts](./src/runtime/room-worker.ts)
  - owns one room's mutable runtime state
  - serializes inbound message handling
  - emits room and actor events
- [src/runtime/turn-executor.ts](./src/runtime/turn-executor.ts)
  - defines the executor seam for one bounded actor turn
- [src/runtime/stub-turn-executor.ts](./src/runtime/stub-turn-executor.ts)
  - current placeholder executor used to exercise lifecycle and HTTP behavior
- [src/store/room-store.ts](./src/store/room-store.ts)
  - storage seam for rooms, messages, and events
- [src/store/memory-store.ts](./src/store/memory-store.ts)
  - default in-memory backing store used by tests and local development
- [src/store/api-room-store.ts](./src/store/api-room-store.ts)
  - API-backed room store using `abbotik/api` generic data routes
- [src/lib/auth-client.ts](./src/lib/auth-client.ts)
  - contract for trusted bearer-token hydration through `api`
- [src/lib/machine-auth.ts](./src/lib/machine-auth.ts)
  - cached machine-auth challenge/verify client for `llm -> api` calls

## Runtime Behavior Today

### Room lifecycle

At a high level, the runtime currently behaves like this:

1. `POST /api/rooms` creates a room record and emits `room:rented`
2. the registry lazily creates a worker for that room
3. `POST /api/rooms/:id/messages` appends a user message and queues work
4. the worker marks the room `active`, runs one actor turn, appends an output
   message, and emits room and actor events
5. when the queue drains, the worker marks the room `idle`
6. `POST /api/rooms/:id/wake`, `POST /api/rooms/:id/interrupt`, and
   `POST /api/rooms/:id/release` provide explicit lifecycle control

### Execution model

The current worker model is deliberately conservative:

- one worker per active room
- one queued message at a time per room
- deterministic event ordering within a room
- one actor turn per queued message
- explicit release path

The main current limitation is that the worker still chooses only the first
actor in the room roster. Multi-actor orchestration is not implemented yet.

### Durable versus ephemeral state

The current code already distinguishes the durable nouns from runtime-only
state, and now supports either an in-memory or API-backed durable store.

Durable nouns:

- room record
- room messages
- room events

Ephemeral runtime state:

- active worker instances
- in-memory message queue
- current abort controller
- SSE subscribers

## Auth And Tenancy

`llm` does not own bearer-token verification logic.

Protected routes currently assume this external contract:

- client sends a normal Abbotik bearer token
- `llm` calls `GET /api/user/introspect` on `api`
- `api` returns trusted user, tenant, and token context
- `llm` authorizes the request from that trusted context

The introspection result is for request admission, not for copying a second auth
record into the room.

The durable room contract stays lean:

- `tenant_id`
- `rented_by`
- `tool_policy`
- `wake_policy`
- `done_policy`
- `metadata`

Readable user labels such as usernames can be rehydrated later from `api` by
`rented_by` when needed.

Current behavior:

- room visibility is tenant-scoped from trusted auth context
- caller-supplied `tenant_id` is optional and rejected if it does not match the
  hydrated tenant
- tests use a fake introspection client instead of making live `api` calls

## HTTP Surface

### Public routes

| Route | Purpose |
|---|---|
| `GET /` | Human-facing root entry |
| `GET /index.html` | Human-facing root entry |
| `GET /index.css` | Root stylesheet |
| `GET /llms.txt` | Agent-facing root entry |
| `GET /health` | Health and readiness |
| `GET /docs` | Implementation docs entrypoint |
| `GET /docs/architecture` | Target architecture document |

### Protected routes

| Route | Purpose |
|---|---|
| `GET /api/rooms` | List visible rooms |
| `POST /api/rooms` | Rent a room |
| `GET /api/rooms/:id` | Fetch room state |
| `PATCH /api/rooms/:id` | Update mutable room metadata or policy |
| `POST /api/rooms/:id/messages` | Inject a message or task |
| `POST /api/rooms/:id/wake` | Explicitly wake a room |
| `GET /api/rooms/:id/events` | Stream room events |
| `GET /api/rooms/:id/history` | Read room messages and events |
| `POST /api/rooms/:id/interrupt` | Interrupt the current in-flight turn |
| `POST /api/rooms/:id/release` | Release a room |
| `GET /api/skills` | List currently known skill descriptors |
| `GET /api/providers` | List configured providers |
| `GET /api/providers/models` | List configured models |

The route layout follows the same path-shaped style used in the sibling `api`
service under [src/routes](./src/routes).

### Response shape

For non-streaming JSON responses, the service uses the same envelope discipline
as `abbotik/api`.

Successful response:

```json
{
  "success": true,
  "data": {}
}
```

Error response:

```json
{
  "success": false,
  "error": "Room not found",
  "error_code": "ROOM_NOT_FOUND"
}
```

Streaming event routes are documented by event payload shape rather than by the
standard JSON envelope.

## Durable Model Assumed By The Runtime

The runtime is built around the lean first-pass durable model already present in
`api`.

### `rooms`

The room record is the durable top-level contract.

Important fields:

- `id`
- `tenant_id`
- `purpose`
- `status`
- `rented_by`
- `rented_at`
- `released_at`
- `last_active_at`
- `actors`
- `tool_policy`
- `wake_policy`
- `done_policy`
- `metadata`
- `summary_text`
- `result`
- `last_error`

### `room_messages`

This stores semantic room inputs and outputs.

Important fields:

- `room_id`
- `actor_key`
- `author_kind`
- `kind`
- `content`
- `metadata`
- `seq`

### `room_events`

This stores append-only execution facts.

Important fields:

- `room_id`
- `actor_key`
- `event_type`
- `payload`
- `seq`

The current event taxonomy in code is:

- `room:rented`
- `room:active`
- `room:idle`
- `room:sleeping`
- `room:wake`
- `room:released`
- `actor:turn_start`
- `actor:turn_end`
- `actor:output`
- `tool:start`
- `tool:end`
- `artifact:created`
- `error`

## Development

### Requirements

- Bun
- Node-compatible local environment for TypeScript tooling

### Install

```bash
bun install
```

### Run tests

```bash
bun test
```

### Build

```bash
bun run build
```

### Run locally

Development:

```bash
bun run start:dev
```

Built server:

```bash
bun run build
bun run start
```

Default port:

- `9002`

Config:

- `PORT` - HTTP port for the `llm` service
- `ABBOTIK_API_URL` - base URL for `llm -> api` calls; in Railway production
  this is currently `https://api.abbotik.com`
- `ABBOTIK_ROOM_STORE` - `memory` by default, set to `api` to persist through
  `abbotik/api`
- `ABBOTIK_MACHINE_TENANT` - machine-auth tenant for `llm -> api` access;
  expected to be the system tenant such as `abbotik`
- `ABBOTIK_MACHINE_KEY_ID` - exact machine key identifier for challenge requests
- `ABBOTIK_MACHINE_KEY_FINGERPRINT` - alternative machine key selector when key
  id is not used
- `ABBOTIK_MACHINE_PRIVATE_KEY_PEM` - private key PEM used to sign challenge
  nonces; Railway may store this as either literal newlines or `\n`
- `ABBOTIK_API_TOKEN` - explicit static bearer-token fallback; kept only as a
  temporary override seam while machine auth is being proven out

## Tests And Verification

The current test suite in [src/server.test.ts](./src/server.test.ts) verifies:

- store mode guardrails for API-backed persistence
- room rental
- message injection and actor output
- SSE replay without keeping the stream open
- explicit room release
- tenant-scoped room visibility
- mismatch rejection for caller-supplied tenant scope

The focused store contract tests in
[src/store/api-room-store.test.ts](./src/store/api-room-store.test.ts) verify:

- room creation through `/api/data/rooms`
- room, message, and event history loading through `/api/data/*`
- not-found handling for missing rooms

The machine auth tests in
[src/lib/machine-auth.test.ts](./src/lib/machine-auth.test.ts) verify:

- challenge and verify exchange against `api`
- signature generation from the configured private key
- short-lived token caching and refresh behavior

The normal local verification gates are:

```bash
bun test
bun run build
```

## Current Limits

The most important current limits are:

- API-backed persistence is opt-in and not exercised by default
- the API-backed store assumes the current `api` durable model shape and fails
  loudly if required fields are missing
- machine-auth key rotation is still an operator concern; Railway env and
  `api` key state must be updated together
- the static `ABBOTIK_API_TOKEN` path still exists only as a temporary fallback
- no process-restart room recovery
- no real provider execution
- no history/context split inside the executor contract yet
- no multi-actor orchestration
- no wake scheduler
- auth hydration has not yet been live-smoked through `llm` with a real bearer
  token after deployment

## Near-Term Working Sequence

The intended working order from here is:

1. keep the API-backed room store explicit, strict, and testable
2. define and implement the real `TurnExecutor` contract with durable history
   and ephemeral context kept separate
3. replace the stub executor with real provider execution
4. add lazy rehydration for active rooms after process restart
5. add multi-actor orchestration only after the single-coordinator path is
   solid
