8 · A teaching world: teach, exam, grade¶
Build worlds · Chapter 8 of 12 · Path home
Here is where worlds get genuinely new. A teaching world is one where an agent learns over time from another, and the world measures whether it did. This is not something a workflow tool can express; it is the Collectionless AI idea made concrete. We build it from two roles, teacher and student, and a loop you will recognize forever after: teach, exam, grade.
The cast
- The teacher has no model (
proc=None). It leads: it streams lessons, sets the exam, and grades. All of its work is built-in actions. - The student owns the model. It just learns what it is taught and answers the exam. Its behavior is a short "listen to the teacher" loop.
Read those two roles together and the whole chapter falls out: the teacher is a game master that runs a session, the student is a peer that does the work, and the rest is plumbing. Nobody is "calling functions" on anybody; the teacher sends requests and the student reacts to them, exactly the way two peers talk in Chapter 6. The novelty is only what they exchange: lessons and a graded exam, instead of a single answer.
The smallest one: cat_library¶
cat_library is the minimal teaching world: a teacher recites a fixed token
sequence (a "poem"), a tiny RNN student learns to predict it, then sits an exam
where it must reproduce it. Everything bigger is this skeleton with more in it.
The student is the model¶
The student role is an empty Agent subclass (class WAgent(Agent): pass):
it adds no custom actions, because it does not need any. Everything a student does,
learn and answer, is already a built-in.
The actual network is supplied when the node joins, with an optimizer and a loss so
it can learn online, sample by sample:
New to proc_inputs / proc_outputs?
These are an agent's fixed input/output slots. If the names are unfamiliar, read An agent's own streams first, it explains the whole convention in one place.
net = RNNTokenLM(num_emb=voc_size, emb_dim=16, y_dim=voc_size, h_dim=100, seed=42)
agent = Agent(
proc=net,
proc_inputs=[StreamType(data_type="text", ...vocab...)],
proc_outputs=[StreamType(data_type="text", ...vocab...)],
proc_opts={"optimizer": torch.optim.SGD(net.parameters(), lr=0.01),
"losses": [torch.nn.functional.cross_entropy]},
buffer_generated_by_others="all",
)
Read it line by line:
proc=netis the student's brain: an RNN language model over the poem's vocabulary. This is the one piece the student owns, the teacher has none.proc_inputs/proc_outputsdeclare the two ends of that brain as streams. Both aretextover the same vocabulary, because the task is next-token prediction: feed a token, predict the next. TheStreamTypealso carries the token↔index maps, so a word arriving on the wire becomes an integer the network can read, and back again on the way out. Declaringproc_inputsis not optional here: the framework wiresstdinby walkingproc_inputs, so an empty list would leave the model with no input andlearn/processwould silently do nothing.proc_optsis what turns a network into a student: an optimizer (SGD) and a loss (cross_entropy). The built-inlearnaction needs exactly these to run a backward pass; without them,learnwould have nothing to update.buffer_generated_by_others="all"tells the agent to keep the outputs it produces while answering, so that after the exam the teacher can fetch them and grade them. Grading reads buffered output; this is the switch that makes it available.
The student's behavior is a tiny reactive loop, built in the world's
create_behav_files. It plugs in one shipped template,
listening_to_teacher, wrapped by two transits:
behav.add_transit("init", listening_json, action="engage",
args={"acceptable_role": "teacher"})
behav.add_transit("teacher_engaged", "init", action="disengage")
What the student does, in order:
engagewithacceptable_role="teacher". The student sits idle until a teacher offers to pair with it, and accepts only a teacher, it refuses to be driven by the wrong kind of peer. This is the worker side of the engagement handshake from Chapter 7.- React to whatever the teacher sends. Inside the template (next section) the
student waits in a blocking state and reacts to incoming requests: a
learnrequest trains it, aprocessrequest tests it. The student never decides when to learn or be examined; the teacher does, by sending. disengageback toinit. When the teacher releases it, the student loops home and is ready for the next teacher.
That is the entire student: pair with a teacher, do as told, go home. All the intelligence about what to teach lives in the teacher.
The teacher leads the loop¶
The teacher's behavior is the interesting part. It has four moves, all
built-ins, and the real source assembles them
by snapping together three shipped templates (engage_by_role, the teach/eval loop,
and the recording step) with wildcards filled in. We unroll them here.
1 · Record the material¶
record snapshots a live stream into a buffer the teacher owns and can replay
as many times as it likes. The teacher cannot teach directly from the world's
ever-flowing cats stream, it needs a fixed, repeatable dataset, so it freezes
998 samples of it:
behav.add_transit("init", "book_prepared", action="record",
args={"streams": ["<world>:cats"],
"num_steps": "<eval_steps>", # 998
"record_uuid": None})
Step by step: record reads num_steps samples from <world>:cats and copies them
into a brand-new owned stream. The framework auto-names that stream
recorded<N>, counting up from one, so this first record produces recorded1.
(That number matters in animal_school, where several record calls produce
recorded1, recorded2, …) record_uuid=None says "read the plain world stream",
which publishes under no per-interaction id, rather than a per-request channel. When
the last sample is in, the buffer is sealed read-only and announced to peers. The
teacher now owns the poem as a dataset it can hand out at will.
Focus · reading the stream references: owner:name and the <...> wildcards
Strings like "<world>:cats" are stream references, written as
owner:stream-name, who owns the stream before the colon, its name after.
The owner is almost always a wildcard the framework fills in when the
behavior actually runs on an agent:
<agent>is this agent itself:"<agent>:recorded1"(below) means "my own stream namedrecorded1", the bufferrecordjust filled.<world>is the world:"<world>:cats"is "the world'scatsstream".<partner>is the engaged peer,<role>the agent's own role.
Why wildcards? One behavior template is shipped to every agent in a role,
so it cannot hard-code ids. <agent> and <world> let it say "my stream" or
"the world's stream" and have each agent resolve it to the right concrete one
on join. <playlist> is different: not an owner but "whichever stream the
playlist pointer is on right now" (set by set_pref_streams), which is how a
single learn transit can walk an entire curriculum.
2 · Find and engage a student¶
Next the teacher recruits. The world chains in the engage_by_role template
with <roles_to_engage> set to student, so the teacher searches the network for
students, connects, waits for the handshake, and offers engagement. When a student
accepts, the template lands in its engagement_complete state, the cue that a
paired student is ready to be taught. This is the recruiter side of the same
handshake the student accepts in step 1; Chapter 7
covers it in full.
3 · Teach¶
Now the teaching loop proper. Two things happen: the teacher sets a playlist of
streams, then sends a learn request for it:
behav.add_transit("engagement_complete", teach_eval_json,
action="set_pref_streams",
args={"net_hashes": ["<agent>:recorded1"], "repeat": 50})
"action": "send",
"action_kwargs": {
"action_name": "learn",
"streams": {"stdin": ["<playlist>"], "stdtar": ["<playlist>"]},
"num_steps": "<learn_steps>", # 998
"wait_completion": true
}
Read it as the teacher's lesson plan:
set_pref_streams(["<agent>:recorded1"], repeat=50)loads the playlist with one item, the recorded poem, laid down 50 times back to back. The playlist is just an ordered list with a pointer;repeat=50is how "teach this poem 50 times" is expressed without writing the loop by hand.- The
learnsendis the lesson itself. The teacher does not run the model, it asks the student to runlearnfornum_steps=998samples, feeding the playlist item as bothstdin(the input) andstdtar(the target). Binding the same stream to input and target is what "learn to reproduce this sequence" means: predict each next token, and the truth you are graded against is the next token. The<playlist>wildcard resolves to whatever the pointer is on right now, so the one transit serves every item and every repeat. wait_completion=Truemakes the teacher'ssendreturnTrueonly once the student reports it has finished all 998 steps. The teacher blocks through the whole lecture instead of racing ahead, pacing the session to the learner.
Where does the data flow? The poem lives in the teacher's recorded1. The send
points the student at it; the student pulls those samples into its own stdin /
stdtar, runs learn (forward pass, then backward pass against stdtar, then an
optimizer step) on each one, and reports done. Nothing is copied to a shared place,
the student reads the teacher's stream directly.
Focus · how one transit walks a whole curriculum (the playlist loop)
The teach/eval template is not a straight line; it is a small loop. After each
learn lecture it visits a change_lecture state that calls
next_pref_stream to advance the playlist pointer, then check_pref_stream
decides where to go:
- if the pointer is not back at the first item, loop to
begin_teachingand teach the next item; - if it wrapped back to the first, the full curriculum (all items, all repeats) is done, move on to the exam.
With cat_library's single-item, 50-repeat playlist that loop simply teaches the
poem 50 times and then stops. The very same machine, given the three-class
playlist of animal_school, teaches each class in turn before examining. One
template, two curricula, that is the point of the playlist abstraction.
4 · Exam, then grade¶
After the curriculum, the teacher examines and scores. Three built-ins do it:
# exam: ask the student to ANSWER (no learning)
"action": "send",
"action_kwargs": {"action_name": "process", "streams": ["<exam_data_ref>"],
"num_steps": "<eval_steps>", "wait_completion": true}
# grade: score the student's buffered answers...
"action": "evaluate",
"action_kwargs": {"stream_hash": "<exam_data_ref>", "how": "max", "steps": "<eval_steps>"}
# ...and branch on the result
"action": "compare_eval", "action_kwargs": {"cmp": "<=", "thres": "<cmp_thres>"} # -> good
In cat_library the world sets <exam_data_ref> to <agent>:recorded1 (the same
poem) and <cmp_thres> to 0.2. Walk the three steps:
- Exam (
process). The teachersends aprocessrequest, inference only, no learning, over the exam stream foreval_stepssamples, again blocking onwait_completion=True. The student runs its model forward and, because it joined withbuffer_generated_by_others="all", keeps the outputs it produces. - Grade (
evaluate). The teacher now scores those buffered outputs against the reference stream.how="max"compares argmax predictions (label-style accuracy);stepsbounds how many samples to score. The result is a single error number per student. The teacher can read those outputs at all because the student buffered them (its ownbuffer_generated_by_others="all"); grading is the teacher reaching into what the student kept. -
Branch (
compare_eval).cmp="<=", thres=0.2turns that number into a pass/fail: students with error ≤ 0.2 pass and are collected into the<valid_cmp>set (used later for badges and promotion); the rest fall through to thebadbranch. That is the entire grade, one comparison.A small honesty fix for text streams
cat_librarysetsre_offset=Trueonevaluate(the world patches it in after building). Text streams do not carry reliable time tags, so the first compared pair could be misaligned;re_offsetre-aligns the two streams' origin before scoring so the comparison is fair.
learn vs process, the one distinction to hold
The same send shape, with action_name="learn", trains the student (a
forward pass and a backward pass against stdtar, then an optimizer step). With
action_name="process" it only runs the model (forward only, no update).
Teaching is learn; the exam is process. Hold that one distinction and the
whole loop reads cleanly.
The loop, as built¶
It is worth seeing the two roles side by side, because the teacher's sends and the
student's reactions are two halves of the same conversation. The student sits in a
blocking teacher_engaged state and has two ready=false transitions out of it,
one per request the teacher might send:
TEACHER (leads) STUDENT (reacts, in listening_to_teacher)
record -> recorded1
engage_by_role -> student paired
set_pref_streams (poem x50)
send learn -----------------------> teacher_engaged --learn--> finished_learning
(blocks on wait_completion) (trains 998 steps, reports done)
...loop the playlist...
send process ----------------------> teacher_engaged --process--> finished_exam
(blocks on wait_completion) (answers 998 steps, buffers output)
evaluate + compare_eval -> good/bad
A ready=false transition means "this fires only when another agent triggers that
action on me". So the student does not poll or choose; each send from the teacher
is the event that moves the student's state machine. The teacher's
wait_completion=True and the student's blocking state are the two ends of the same
rendez-vous: the teacher waits for "done", the student waits for "what next".
Adding consequences: animal_school¶
animal_school keeps the exact teacher/student skeleton, same templates, same
four moves, and changes only two things, both inside create_behav_files. The
student even reuses the identical engage + listening_to_teacher behavior; the
only thing that differs is the content the teacher pours through it. Here the
student is an image classifier (a CNNCNU with three outputs labelled albatross,
cheetah, giraffe), so the streams carry pictures and labels instead of text, but
the loop is byte-for-byte the same.
Where the data comes from: the world's environmental streams¶
Before the teacher can record anything, the world itself has to supply the raw
material. cat_library glossed over this (its poem is tiny); animal_school makes
it explicit. In its __init__, the world publishes its own environmental
streams, streams whose data comes from a source in the environment (here, image
files and labels on disk, standing in for cameras and sensors), not from any model's
forward():
self.add_streams([
DataStream.create(group="albatross", stream=ImageFileStream(image_dir=data_path, ...)),
DataStream.create(group="albatross", stream=LabelStream(label_dir=data_path, ...)),
])
# ...same for "cheetah", "giraffe", and a mixed "all" set...
Read it this way:
- The world owns these streams.
self.add_streams(...)makes the world a provider of data, not just a host of roles. Every member can then read them. - Pictures and labels travel together, by group. Each pair shares a
group("albatross"), so an image and its correct label arrive as one unit, that is what makes supervised teaching possible. "all"is a group name, not the wildcard. The mixed set of all three classes is published under the group"all"; do not confuse it with thedata_type="all"wildcard from the type table.- They are addressed as
<world>:<group>. That is precisely why the next section records from<world>:all,<world>:albatross, and so on, those references point at the world's environmental streams. An agent that owned an environmental source of its own (a real camera) would read it through theself.stdextproxy; here the world is the environment, so the teacher reads it by reference.
The full model of proc_* slots versus environmental streams and stdext lives in
An agent's own streams.
A curriculum, not a single lesson¶
Instead of one record, the teacher records four times, and the order is the
whole trick:
behav.add_transit("init", "snapshotting_albatross", action="record",
args={"streams": ["<world>:all"], ...}) # -> recorded1 (exam)
behav.add_transit("snapshotting_albatross", "snapshotting_cheetah", action="record",
args={"streams": ["<world>:albatross"], ...}) # -> recorded2
behav.add_transit("snapshotting_cheetah", "snapshotting_giraffe", action="record",
args={"streams": ["<world>:cheetah"], ...}) # -> recorded3
behav.add_transit("snapshotting_giraffe", "exam_prepared", action="record",
args={"streams": ["<world>:giraffe"], ...}) # -> recorded4
Because record auto-numbers its snapshots in call order, the first call (the
mixed all set) becomes recorded1, and the three single-class sets become
recorded2, recorded3, recorded4. That numbering is then used deliberately:
behav.add_wildcards({"<exam_data_ref>": Custom.AGENT_WILDCARD + ":recorded1"}) # mixed exam
behav.add_transit("engagement_complete", teach_eval_json,
action="set_pref_streams",
args={"net_hashes": ["<agent>:recorded2", # albatross lecture
"<agent>:recorded3", # cheetah lecture
"<agent>:recorded4"]}) # giraffe lecture
So the playlist is the three single-class lectures, taught one after another,
while the exam is the mixed recorded1 set that contains all three classes. The
teach/eval loop you met in cat_library now does real work: it teaches albatross,
then cheetah, then giraffe (the playlist loop advancing through each item), and only
when the pointer wraps does it examine the student on everything at once.
Teaching classes one after another, then testing on all of them, is exactly the
setup that reveals catastrophic forgetting: a plain model's accuracy on
albatross collapses once it has been trained on cheetah and giraffe. Run a
memory-augmented student against a plain one through the same teacher and the
exam score shows the difference, a measurable research result, expressed as a
world rather than a script. (The pass bar is looser too: <cmp_thres> is 0.65
here, and lectures and exams are short, <learn_steps>=40, <eval_steps>=30,
because images are heavier than tokens.)
Promotion: a skill spreads through the society¶
The second addition is consequences. In cat_library passing led nowhere; here,
the good branch of compare_eval is wired to reward and promote the students in
<valid_cmp> (the set that passed):
behav.add_state("good", action="suggest_badges_to_world",
args={"agent": "<valid_cmp>", "score": 1.0, "badge_type": "completed",
"badge_description": "Completed the Animal School ..."})
behav.add_transit("good", "promote",
action="suggest_role_to_world", args={"agent": "<valid_cmp>", "role": "teacher"})
behav.add_transit("promote", "habilitate", action="send_disengage")
Three steps, all aimed at <valid_cmp>:
suggest_badges_to_worldasks the world to award a "completed" badge to each passing student. The teacher cannot mint badges itself, only the world master can, so it suggests, and the world decides. The badge then shows on the student's profile.suggest_role_to_world(role="teacher")asks the world to promote the student to teacher. Again it is a suggestion to the only authority that can change a role.send_disengagereleases the now-graduated student, ending the session.
The world applies the role change (set_role), and the
fresh teacher can now record its own lessons and teach the next cohort. A skill
spreads through the society of peers, no central script, no leader handing out
tasks, just roles changing as agents succeed and former students becoming the
teachers of the next ones.
graph LR
T[Teacher<br/>proc = None] -->|record| M[lessons + exam set]
T -->|learn x rounds| S[Student<br/>owns the model]
T -->|process exam| S
T -->|evaluate + compare_eval| G{pass?}
G -->|yes| P[badge + promote to teacher]
G -->|no| T
What just happened¶
You built a world where an agent learns and is graded, from the minimal
cat_library to animal_school with a curriculum, forgetting, and promotion. The
moves, record, set_pref_streams, learn vs process, evaluate /
compare_eval, suggest_badges_to_world / suggest_role_to_world, are the
teacher's whole toolkit. Next, the same teacher leads many students at once.
Where next¶
-
One leader, many students: fan-out, rounds, and peer teaching.
-
processvslearn,proc_opts, and the teacher/student pattern at concept level. (Models covers the brains that learn online.) -
record,set_pref_streams,evaluate,compare_eval.