Learning and teaching¶
Most agents only ever process: they read an input, run forward(), and
write an output. But an UNaIVERSE agent can also learn: improve
its own weights from data that arrives live on a stream,
with no central dataset anywhere. This is the heart of what makes the platform
different, so it deserves its own page.
The one idea to keep
process answers. learn answers and then corrects itself using the
right answer that came alongside the input. Same perceive, think, act loop,
with one extra beat.
flowchart LR
IN[/input on stdin/] --> F["forward()"]
F --> OUT[/output on stdout/]
TAR[/correct answer on stdtar/] -. only when learning .-> BW["backward pass<br/>update the weights"]
OUT -. only when learning .-> BW
process vs learn¶
These are two built-in actions. They share the first
three steps; learn adds a fourth.
process |
learn |
|
|---|---|---|
Read the input (stdin) |
yes | yes |
Run forward() |
yes | yes |
Write the output (stdout) |
yes | yes |
Read the target (stdtar) |
no | yes |
| Run a backward pass and update weights | no | yes |
The target is the correct answer for this input (a label, a reference signal, a
next token). It travels on the agent's stdtar slot, exactly the way the input
travels on stdin (see An agent's own streams). A
learning step needs both: the input to predict from, and the target to be
corrected against.
Giving an agent the ability to learn¶
An agent can only learn if you hand it an optimizer and at least one loss,
through the proc_opts argument of the Agent constructor:
import torch
from unaiverse.agent import Agent
agent = Agent(
proc=my_model,
proc_inputs=["img"],
proc_outputs=["tensor"],
proc_opts={
"optimizer": torch.optim.SGD(my_model.parameters(), lr=0.01),
"losses": [torch.nn.functional.cross_entropy],
},
)
proc_opts["optimizer"]· a PyTorch optimizer over your model's parameters.- Standard optimizers (
SGD,Adam, ...) all work. For the continuous-time models you can pass the Hamiltonian Learning optimizer instead. proc_opts["losses"]· a list of loss functions.- One or more callables, e.g.
cross_entropy,binary_cross_entropy, ormse_loss. The list lets a model with several outputs carry one loss each.
No optimizer or no losses means no learning
If proc_opts has no optimizer, or an empty losses list, the learn action
simply returns False ("this processor has no learning skills") and the agent
keeps inferring. Learning is opt-in, you switch it on by configuring
proc_opts.
These are the exact shapes used by the shipped example worlds:
What learn does, step by step¶
When a learn action runs on one input:
- It checks the agent actually has an optimizer and losses. If not, it returns
Falseand stops here. - It runs
processfirst: readstdin, callforward(), writestdout. - It reads the target from
stdtarfor the same request. - It runs the backward pass against that target and steps the optimizer, then logs the loss values.
So a learning step is always an inference step plus a correction. Both the input and its target are scoped to the same request, which is why teaching one sample means delivering the pair together.
Learning on a stream, not a dataset¶
Here is the part that has no equivalent in a classic training script. The data an agent learns from is never collected into a central dataset. It arrives as a stream, one sample at a time, and the agent learns from each sample as it shows up, then moves on. Training is something that happens while the agent is live on the network, not a separate offline phase over a frozen folder of files.
That changes what "learning" means in practice:
- It is continual. New data keeps arriving; the agent keeps adapting. There is no "epoch 100, done."
- It must not forget. Because old samples are gone (they were never stored), an agent has to retain what it learned earlier while absorbing what is new. Resisting this catastrophic forgetting is a first-class concern, and the signature components, the CNU associative memory and Hamiltonian Learning, exist precisely to make on-stream, lifelong learning work without classic backprop over a frozen weight matrix.
The teacher and the student¶
Learning rarely happens alone. The common shape is two agents: a teacher that
owns the data and drives the lesson, and a student that does the actual
learning. The teacher streams labeled samples and asks the student to learn
over them; later it asks the student to process an exam and grades the
result.
sequenceDiagram
participant T as Teacher
participant S as Student
T->>S: send(action_name="learn"), labeled samples over many steps
Note over S: runs learn on each sample,<br/>updates its weights live
T->>S: send(action_name="process"), an unseen exam
Note over S: answers, and keeps its outputs
T->>T: evaluate + compare_eval, pass or fail
T-->>S: suggest a badge or a new role
The teacher's side is assembled from a small set of built-in actions, record,
set_pref_streams, learn, process, evaluate, compare_eval, and
suggest_badges_to_world / suggest_role_to_world, each documented with its
parameters in the built-in actions reference. The student
is often just a plain agent with proc_opts set; it keeps the outputs it
produces (via buffer_generated_by_others) so the teacher can read them back to
grade. The full, runnable walk-through lives in the world-builder path:
teaching a world and
the master at work.
Where next¶
- Models, the CNU memory and Hamiltonian Learning that power on-stream learning.
- Data streams, where
stdtar(the target) comes from. - Teaching a world, the teacher/student pattern end to end.
-
AgentAPI reference,proc_opts,learn, andprocess.