Skip to content

Learning and teaching

Most agents only ever process: they read an input, run forward(), and write an output. But an UNaIVERSE agent can also learn: improve its own weights from data that arrives live on a stream, with no central dataset anywhere. This is the heart of what makes the platform different, so it deserves its own page.

The one idea to keep

process answers. learn answers and then corrects itself using the right answer that came alongside the input. Same perceive, think, act loop, with one extra beat.

flowchart LR
    IN[/input on stdin/] --> F["forward()"]
    F --> OUT[/output on stdout/]
    TAR[/correct answer on stdtar/] -. only when learning .-> BW["backward pass<br/>update the weights"]
    OUT -. only when learning .-> BW

process vs learn

These are two built-in actions. They share the first three steps; learn adds a fourth.

process learn
Read the input (stdin) yes yes
Run forward() yes yes
Write the output (stdout) yes yes
Read the target (stdtar) no yes
Run a backward pass and update weights no yes

The target is the correct answer for this input (a label, a reference signal, a next token). It travels on the agent's stdtar slot, exactly the way the input travels on stdin (see An agent's own streams). A learning step needs both: the input to predict from, and the target to be corrected against.

Giving an agent the ability to learn

An agent can only learn if you hand it an optimizer and at least one loss, through the proc_opts argument of the Agent constructor:

import torch
from unaiverse.agent import Agent

agent = Agent(
    proc=my_model,
    proc_inputs=["img"],
    proc_outputs=["tensor"],
    proc_opts={
        "optimizer": torch.optim.SGD(my_model.parameters(), lr=0.01),
        "losses":    [torch.nn.functional.cross_entropy],
    },
)
proc_opts["optimizer"] · a PyTorch optimizer over your model's parameters.
Standard optimizers (SGD, Adam, ...) all work. For the continuous-time models you can pass the Hamiltonian Learning optimizer instead.
proc_opts["losses"] · a list of loss functions.
One or more callables, e.g. cross_entropy, binary_cross_entropy, or mse_loss. The list lets a model with several outputs carry one loss each.

No optimizer or no losses means no learning

If proc_opts has no optimizer, or an empty losses list, the learn action simply returns False ("this processor has no learning skills") and the agent keeps inferring. Learning is opt-in, you switch it on by configuring proc_opts.

These are the exact shapes used by the shipped example worlds:

proc_opts={"optimizer": torch.optim.SGD(net.parameters(), lr=0.01),
           "losses":    [torch.nn.functional.cross_entropy]}
proc_opts={"optimizer": torch.optim.Adam(net.parameters(), lr=0.0025),
           "losses":    [torch.nn.functional.cross_entropy]}
proc_opts={"optimizer": HL(net.module, gamma=1., theta=0.2, beta=0.01),
           "losses":    [torch.nn.functional.mse_loss]}

What learn does, step by step

When a learn action runs on one input:

  1. It checks the agent actually has an optimizer and losses. If not, it returns False and stops here.
  2. It runs process first: read stdin, call forward(), write stdout.
  3. It reads the target from stdtar for the same request.
  4. It runs the backward pass against that target and steps the optimizer, then logs the loss values.

So a learning step is always an inference step plus a correction. Both the input and its target are scoped to the same request, which is why teaching one sample means delivering the pair together.

Learning on a stream, not a dataset

Here is the part that has no equivalent in a classic training script. The data an agent learns from is never collected into a central dataset. It arrives as a stream, one sample at a time, and the agent learns from each sample as it shows up, then moves on. Training is something that happens while the agent is live on the network, not a separate offline phase over a frozen folder of files.

That changes what "learning" means in practice:

  • It is continual. New data keeps arriving; the agent keeps adapting. There is no "epoch 100, done."
  • It must not forget. Because old samples are gone (they were never stored), an agent has to retain what it learned earlier while absorbing what is new. Resisting this catastrophic forgetting is a first-class concern, and the signature components, the CNU associative memory and Hamiltonian Learning, exist precisely to make on-stream, lifelong learning work without classic backprop over a frozen weight matrix.

The teacher and the student

Learning rarely happens alone. The common shape is two agents: a teacher that owns the data and drives the lesson, and a student that does the actual learning. The teacher streams labeled samples and asks the student to learn over them; later it asks the student to process an exam and grades the result.

sequenceDiagram
    participant T as Teacher
    participant S as Student
    T->>S: send(action_name="learn"), labeled samples over many steps
    Note over S: runs learn on each sample,<br/>updates its weights live
    T->>S: send(action_name="process"), an unseen exam
    Note over S: answers, and keeps its outputs
    T->>T: evaluate + compare_eval, pass or fail
    T-->>S: suggest a badge or a new role

The teacher's side is assembled from a small set of built-in actions, record, set_pref_streams, learn, process, evaluate, compare_eval, and suggest_badges_to_world / suggest_role_to_world, each documented with its parameters in the built-in actions reference. The student is often just a plain agent with proc_opts set; it keeps the outputs it produces (via buffer_generated_by_others) so the teacher can read them back to grade. The full, runnable walk-through lives in the world-builder path: teaching a world and the master at work.

Where next