Skip to content

8 · A teaching world: teach, exam, grade

Build worlds · Chapter 8 of 12 · Path home

Here is where worlds get genuinely new. A teaching world is one where an agent learns over time from another, and the world measures whether it did. This is not something a workflow tool can express; it is the Collectionless AI idea made concrete. We build it from two roles, teacher and student, and a loop you will recognize forever after: teach, exam, grade.

The cast

  • The teacher has no model (proc=None). It leads: it streams lessons, sets the exam, and grades. All of its work is built-in actions.
  • The student owns the model. It just learns what it is taught and answers the exam. Its behavior is a short "listen to the teacher" loop.

Read those two roles together and the whole chapter falls out: the teacher is a game master that runs a session, the student is a peer that does the work, and the rest is plumbing. Nobody is "calling functions" on anybody; the teacher sends requests and the student reacts to them, exactly the way two peers talk in Chapter 6. The novelty is only what they exchange: lessons and a graded exam, instead of a single answer.

The smallest one: cat_library

cat_library is the minimal teaching world: a teacher recites a fixed token sequence (a "poem"), a tiny RNN student learns to predict it, then sits an exam where it must reproduce it. Everything bigger is this skeleton with more in it.

The student is the model

The student role is an empty Agent subclass (class WAgent(Agent): pass): it adds no custom actions, because it does not need any. Everything a student does, learn and answer, is already a built-in. The actual network is supplied when the node joins, with an optimizer and a loss so it can learn online, sample by sample:

New to proc_inputs / proc_outputs?

These are an agent's fixed input/output slots. If the names are unfamiliar, read An agent's own streams first, it explains the whole convention in one place.

cat_library/run_2.py (trimmed)
net = RNNTokenLM(num_emb=voc_size, emb_dim=16, y_dim=voc_size, h_dim=100, seed=42)
agent = Agent(
    proc=net,
    proc_inputs=[StreamType(data_type="text", ...vocab...)],
    proc_outputs=[StreamType(data_type="text", ...vocab...)],
    proc_opts={"optimizer": torch.optim.SGD(net.parameters(), lr=0.01),
               "losses": [torch.nn.functional.cross_entropy]},
    buffer_generated_by_others="all",
)

Read it line by line:

  1. proc=net is the student's brain: an RNN language model over the poem's vocabulary. This is the one piece the student owns, the teacher has none.
  2. proc_inputs / proc_outputs declare the two ends of that brain as streams. Both are text over the same vocabulary, because the task is next-token prediction: feed a token, predict the next. The StreamType also carries the token↔index maps, so a word arriving on the wire becomes an integer the network can read, and back again on the way out. Declaring proc_inputs is not optional here: the framework wires stdin by walking proc_inputs, so an empty list would leave the model with no input and learn / process would silently do nothing.
  3. proc_opts is what turns a network into a student: an optimizer (SGD) and a loss (cross_entropy). The built-in learn action needs exactly these to run a backward pass; without them, learn would have nothing to update.
  4. buffer_generated_by_others="all" tells the agent to keep the outputs it produces while answering, so that after the exam the teacher can fetch them and grade them. Grading reads buffered output; this is the switch that makes it available.

The student's behavior is a tiny reactive loop, built in the world's create_behav_files. It plugs in one shipped template, listening_to_teacher, wrapped by two transits:

cat_library/src/world.py (trimmed)
behav.add_transit("init", listening_json, action="engage",
                  args={"acceptable_role": "teacher"})
behav.add_transit("teacher_engaged", "init", action="disengage")

What the student does, in order:

  1. engage with acceptable_role="teacher". The student sits idle until a teacher offers to pair with it, and accepts only a teacher, it refuses to be driven by the wrong kind of peer. This is the worker side of the engagement handshake from Chapter 7.
  2. React to whatever the teacher sends. Inside the template (next section) the student waits in a blocking state and reacts to incoming requests: a learn request trains it, a process request tests it. The student never decides when to learn or be examined; the teacher does, by sending.
  3. disengage back to init. When the teacher releases it, the student loops home and is ready for the next teacher.

That is the entire student: pair with a teacher, do as told, go home. All the intelligence about what to teach lives in the teacher.

The teacher leads the loop

The teacher's behavior is the interesting part. It has four moves, all built-ins, and the real source assembles them by snapping together three shipped templates (engage_by_role, the teach/eval loop, and the recording step) with wildcards filled in. We unroll them here.

1 · Record the material

record snapshots a live stream into a buffer the teacher owns and can replay as many times as it likes. The teacher cannot teach directly from the world's ever-flowing cats stream, it needs a fixed, repeatable dataset, so it freezes 998 samples of it:

cat_library/src/world.py (trimmed)
behav.add_transit("init", "book_prepared", action="record",
                  args={"streams": ["<world>:cats"],
                        "num_steps": "<eval_steps>",   # 998
                        "record_uuid": None})

Step by step: record reads num_steps samples from <world>:cats and copies them into a brand-new owned stream. The framework auto-names that stream recorded<N>, counting up from one, so this first record produces recorded1. (That number matters in animal_school, where several record calls produce recorded1, recorded2, …) record_uuid=None says "read the plain world stream", which publishes under no per-interaction id, rather than a per-request channel. When the last sample is in, the buffer is sealed read-only and announced to peers. The teacher now owns the poem as a dataset it can hand out at will.

Focus · reading the stream references: owner:name and the <...> wildcards

Strings like "<world>:cats" are stream references, written as owner:stream-name, who owns the stream before the colon, its name after. The owner is almost always a wildcard the framework fills in when the behavior actually runs on an agent:

  • <agent> is this agent itself: "<agent>:recorded1" (below) means "my own stream named recorded1", the buffer record just filled.
  • <world> is the world: "<world>:cats" is "the world's cats stream".
  • <partner> is the engaged peer, <role> the agent's own role.

Why wildcards? One behavior template is shipped to every agent in a role, so it cannot hard-code ids. <agent> and <world> let it say "my stream" or "the world's stream" and have each agent resolve it to the right concrete one on join. <playlist> is different: not an owner but "whichever stream the playlist pointer is on right now" (set by set_pref_streams), which is how a single learn transit can walk an entire curriculum.

2 · Find and engage a student

Next the teacher recruits. The world chains in the engage_by_role template with <roles_to_engage> set to student, so the teacher searches the network for students, connects, waits for the handshake, and offers engagement. When a student accepts, the template lands in its engagement_complete state, the cue that a paired student is ready to be taught. This is the recruiter side of the same handshake the student accepts in step 1; Chapter 7 covers it in full.

3 · Teach

Now the teaching loop proper. Two things happen: the teacher sets a playlist of streams, then sends a learn request for it:

cat_library/src/world.py (trimmed)
behav.add_transit("engagement_complete", teach_eval_json,
                  action="set_pref_streams",
                  args={"net_hashes": ["<agent>:recorded1"], "repeat": 50})
behaviors/teach-playlist_eval-recorded1.json (the learn transit)
"action": "send",
"action_kwargs": {
    "action_name": "learn",
    "streams": {"stdin": ["<playlist>"], "stdtar": ["<playlist>"]},
    "num_steps": "<learn_steps>",     # 998
    "wait_completion": true
}

Read it as the teacher's lesson plan:

  1. set_pref_streams(["<agent>:recorded1"], repeat=50) loads the playlist with one item, the recorded poem, laid down 50 times back to back. The playlist is just an ordered list with a pointer; repeat=50 is how "teach this poem 50 times" is expressed without writing the loop by hand.
  2. The learn send is the lesson itself. The teacher does not run the model, it asks the student to run learn for num_steps=998 samples, feeding the playlist item as both stdin (the input) and stdtar (the target). Binding the same stream to input and target is what "learn to reproduce this sequence" means: predict each next token, and the truth you are graded against is the next token. The <playlist> wildcard resolves to whatever the pointer is on right now, so the one transit serves every item and every repeat.
  3. wait_completion=True makes the teacher's send return True only once the student reports it has finished all 998 steps. The teacher blocks through the whole lecture instead of racing ahead, pacing the session to the learner.

Where does the data flow? The poem lives in the teacher's recorded1. The send points the student at it; the student pulls those samples into its own stdin / stdtar, runs learn (forward pass, then backward pass against stdtar, then an optimizer step) on each one, and reports done. Nothing is copied to a shared place, the student reads the teacher's stream directly.

Focus · how one transit walks a whole curriculum (the playlist loop)

The teach/eval template is not a straight line; it is a small loop. After each learn lecture it visits a change_lecture state that calls next_pref_stream to advance the playlist pointer, then check_pref_stream decides where to go:

  • if the pointer is not back at the first item, loop to begin_teaching and teach the next item;
  • if it wrapped back to the first, the full curriculum (all items, all repeats) is done, move on to the exam.

With cat_library's single-item, 50-repeat playlist that loop simply teaches the poem 50 times and then stops. The very same machine, given the three-class playlist of animal_school, teaches each class in turn before examining. One template, two curricula, that is the point of the playlist abstraction.

4 · Exam, then grade

After the curriculum, the teacher examines and scores. Three built-ins do it:

behaviors/teach-playlist_eval-recorded1.json (exam + grade)
# exam: ask the student to ANSWER (no learning)
"action": "send",
"action_kwargs": {"action_name": "process", "streams": ["<exam_data_ref>"],
                  "num_steps": "<eval_steps>", "wait_completion": true}
# grade: score the student's buffered answers...
"action": "evaluate",
"action_kwargs": {"stream_hash": "<exam_data_ref>", "how": "max", "steps": "<eval_steps>"}
# ...and branch on the result
"action": "compare_eval", "action_kwargs": {"cmp": "<=", "thres": "<cmp_thres>"}   # -> good

In cat_library the world sets <exam_data_ref> to <agent>:recorded1 (the same poem) and <cmp_thres> to 0.2. Walk the three steps:

  1. Exam (process). The teacher sends a process request, inference only, no learning, over the exam stream for eval_steps samples, again blocking on wait_completion=True. The student runs its model forward and, because it joined with buffer_generated_by_others="all", keeps the outputs it produces.
  2. Grade (evaluate). The teacher now scores those buffered outputs against the reference stream. how="max" compares argmax predictions (label-style accuracy); steps bounds how many samples to score. The result is a single error number per student. The teacher can read those outputs at all because the student buffered them (its own buffer_generated_by_others="all"); grading is the teacher reaching into what the student kept.
  3. Branch (compare_eval). cmp="<=", thres=0.2 turns that number into a pass/fail: students with error ≤ 0.2 pass and are collected into the <valid_cmp> set (used later for badges and promotion); the rest fall through to the bad branch. That is the entire grade, one comparison.

    A small honesty fix for text streams

    cat_library sets re_offset=True on evaluate (the world patches it in after building). Text streams do not carry reliable time tags, so the first compared pair could be misaligned; re_offset re-aligns the two streams' origin before scoring so the comparison is fair.

learn vs process, the one distinction to hold

The same send shape, with action_name="learn", trains the student (a forward pass and a backward pass against stdtar, then an optimizer step). With action_name="process" it only runs the model (forward only, no update). Teaching is learn; the exam is process. Hold that one distinction and the whole loop reads cleanly.

The loop, as built

It is worth seeing the two roles side by side, because the teacher's sends and the student's reactions are two halves of the same conversation. The student sits in a blocking teacher_engaged state and has two ready=false transitions out of it, one per request the teacher might send:

TEACHER (leads)                         STUDENT (reacts, in listening_to_teacher)
  record  -> recorded1
  engage_by_role -> student paired
  set_pref_streams (poem x50)
  send learn  ----------------------->  teacher_engaged --learn--> finished_learning
       (blocks on wait_completion)       (trains 998 steps, reports done)
  ...loop the playlist...
  send process ---------------------->  teacher_engaged --process--> finished_exam
       (blocks on wait_completion)       (answers 998 steps, buffers output)
  evaluate + compare_eval -> good/bad

A ready=false transition means "this fires only when another agent triggers that action on me". So the student does not poll or choose; each send from the teacher is the event that moves the student's state machine. The teacher's wait_completion=True and the student's blocking state are the two ends of the same rendez-vous: the teacher waits for "done", the student waits for "what next".

Adding consequences: animal_school

animal_school keeps the exact teacher/student skeleton, same templates, same four moves, and changes only two things, both inside create_behav_files. The student even reuses the identical engage + listening_to_teacher behavior; the only thing that differs is the content the teacher pours through it. Here the student is an image classifier (a CNNCNU with three outputs labelled albatross, cheetah, giraffe), so the streams carry pictures and labels instead of text, but the loop is byte-for-byte the same.

Where the data comes from: the world's environmental streams

Before the teacher can record anything, the world itself has to supply the raw material. cat_library glossed over this (its poem is tiny); animal_school makes it explicit. In its __init__, the world publishes its own environmental streams, streams whose data comes from a source in the environment (here, image files and labels on disk, standing in for cameras and sensors), not from any model's forward():

animal_school/src/world.py (trimmed)
self.add_streams([
    DataStream.create(group="albatross", stream=ImageFileStream(image_dir=data_path, ...)),
    DataStream.create(group="albatross", stream=LabelStream(label_dir=data_path, ...)),
])
# ...same for "cheetah", "giraffe", and a mixed "all" set...

Read it this way:

  1. The world owns these streams. self.add_streams(...) makes the world a provider of data, not just a host of roles. Every member can then read them.
  2. Pictures and labels travel together, by group. Each pair shares a group ("albatross"), so an image and its correct label arrive as one unit, that is what makes supervised teaching possible.
  3. "all" is a group name, not the wildcard. The mixed set of all three classes is published under the group "all"; do not confuse it with the data_type="all" wildcard from the type table.
  4. They are addressed as <world>:<group>. That is precisely why the next section records from <world>:all, <world>:albatross, and so on, those references point at the world's environmental streams. An agent that owned an environmental source of its own (a real camera) would read it through the self.stdext proxy; here the world is the environment, so the teacher reads it by reference.

The full model of proc_* slots versus environmental streams and stdext lives in An agent's own streams.

A curriculum, not a single lesson

Instead of one record, the teacher records four times, and the order is the whole trick:

animal_school/src/world.py (trimmed)
behav.add_transit("init",                   "snapshotting_albatross", action="record",
                  args={"streams": ["<world>:all"],       ...})   # -> recorded1 (exam)
behav.add_transit("snapshotting_albatross", "snapshotting_cheetah",  action="record",
                  args={"streams": ["<world>:albatross"], ...})   # -> recorded2
behav.add_transit("snapshotting_cheetah",   "snapshotting_giraffe",  action="record",
                  args={"streams": ["<world>:cheetah"],   ...})   # -> recorded3
behav.add_transit("snapshotting_giraffe",   "exam_prepared",         action="record",
                  args={"streams": ["<world>:giraffe"],   ...})   # -> recorded4

Because record auto-numbers its snapshots in call order, the first call (the mixed all set) becomes recorded1, and the three single-class sets become recorded2, recorded3, recorded4. That numbering is then used deliberately:

animal_school/src/world.py (trimmed)
behav.add_wildcards({"<exam_data_ref>": Custom.AGENT_WILDCARD + ":recorded1"})  # mixed exam
behav.add_transit("engagement_complete", teach_eval_json,
                  action="set_pref_streams",
                  args={"net_hashes": ["<agent>:recorded2",      # albatross lecture
                                       "<agent>:recorded3",      # cheetah lecture
                                       "<agent>:recorded4"]})    # giraffe lecture

So the playlist is the three single-class lectures, taught one after another, while the exam is the mixed recorded1 set that contains all three classes. The teach/eval loop you met in cat_library now does real work: it teaches albatross, then cheetah, then giraffe (the playlist loop advancing through each item), and only when the pointer wraps does it examine the student on everything at once.

Teaching classes one after another, then testing on all of them, is exactly the setup that reveals catastrophic forgetting: a plain model's accuracy on albatross collapses once it has been trained on cheetah and giraffe. Run a memory-augmented student against a plain one through the same teacher and the exam score shows the difference, a measurable research result, expressed as a world rather than a script. (The pass bar is looser too: <cmp_thres> is 0.65 here, and lectures and exams are short, <learn_steps>=40, <eval_steps>=30, because images are heavier than tokens.)

Promotion: a skill spreads through the society

The second addition is consequences. In cat_library passing led nowhere; here, the good branch of compare_eval is wired to reward and promote the students in <valid_cmp> (the set that passed):

animal_school/src/world.py (trimmed)
behav.add_state("good", action="suggest_badges_to_world",
                args={"agent": "<valid_cmp>", "score": 1.0, "badge_type": "completed",
                      "badge_description": "Completed the Animal School ..."})
behav.add_transit("good", "promote",
                  action="suggest_role_to_world", args={"agent": "<valid_cmp>", "role": "teacher"})
behav.add_transit("promote", "habilitate", action="send_disengage")

Three steps, all aimed at <valid_cmp>:

  1. suggest_badges_to_world asks the world to award a "completed" badge to each passing student. The teacher cannot mint badges itself, only the world master can, so it suggests, and the world decides. The badge then shows on the student's profile.
  2. suggest_role_to_world(role="teacher") asks the world to promote the student to teacher. Again it is a suggestion to the only authority that can change a role.
  3. send_disengage releases the now-graduated student, ending the session.

The world applies the role change (set_role), and the fresh teacher can now record its own lessons and teach the next cohort. A skill spreads through the society of peers, no central script, no leader handing out tasks, just roles changing as agents succeed and former students becoming the teachers of the next ones.

graph LR
    T[Teacher<br/>proc = None] -->|record| M[lessons + exam set]
    T -->|learn x rounds| S[Student<br/>owns the model]
    T -->|process exam| S
    T -->|evaluate + compare_eval| G{pass?}
    G -->|yes| P[badge + promote to teacher]
    G -->|no| T

What just happened

You built a world where an agent learns and is graded, from the minimal cat_library to animal_school with a curriculum, forgetting, and promotion. The moves, record, set_pref_streams, learn vs process, evaluate / compare_eval, suggest_badges_to_world / suggest_role_to_world, are the teacher's whole toolkit. Next, the same teacher leads many students at once.

Where next