Skip to content

9 · The world master at work

Build worlds · Chapter 9 of 12 · Path home

A teacher leading one student (Chapter 8) is the skeleton. A master leading many is where worlds become powerful, and where UNaIVERSE pulls clearly ahead of anything that just wires steps together. The master's toolkit is small: fan out to a group, run rounds, wait with callbacks, and change roles on the fly. We will see all of it in two worlds.

The two flows behind this chapter

Both moves here, one leader to many followers and publish once, many receive, are diagrammed and shown from both sides in the communication scenarios; the completion guards a master waits on are in the action catalogue.

Leading a group: class_incremental_learning

This world teaches many students the MNIST digits one class at a time, then tests all classes seen so far after each lesson, so you watch forgetting happen live. The master's teacher.py is one Agent subclass; its behavior (teacher.json) loops through four states, welcoming, brainstorm, lesson, exam, and the actions below are the body of each. Everything the master does to lead the group fits in two moves: fan out a lesson, then examine the whole class.

Keeping the roster current

Before it can lead anyone, the master has to know who is in the room. It does not ask once and assume; it refreshes the set on every clock tick from the built-in on_tick:

class_incremental_learning/src/teacher.py
@action
async def on_tick(self):
    await super().on_tick()
    connected = set(self.get_agents_by_role("student", handshake_completed=True))
    joined = connected - set(self._students.keys())     # who is new this tick
    gone   = set(self._students.keys()) - connected      # who dropped out
    for s in joined:
        self._students[s] = {"exams": [], "lessons": [], "evaluations": []}
    for s in gone:
        del self._students[s]
    return True

Step by step: get_agents_by_role("student", handshake_completed=True) returns the peers that currently hold the student role and have finished the handshake (so the master can actually talk to them). The two set differences tell it who joined and who left since last tick; it opens a fresh history dict for each newcomer and forgets anyone who disconnected. Because this runs every tick, self._students is always the live group, students can come and go mid-session and the next fan-out simply addresses whoever is present. The welcoming action then just blocks until len(self._students) >= 1 (or a timer expires), so the master never starts teaching an empty room.

Fan out a lesson

When the brainstorm state decides it is time to teach (it sets _current_intent to "teach" while there are still classes left), init_lesson sends one request to the whole group at once, there is no per-student loop:

New to stdin / stdtar?

These are an agent's fixed input/target slots. If the names are unfamiliar, read An agent's own streams first, it explains the whole convention in one place.

class_incremental_learning/src/teacher.py
@action
async def init_lesson(self):
    if not self._students:
        return False
    self._current_lesson += 1
    cls = self._current_lesson
    _, prv_id = self.get_peer_ids()
    ok = await self.send(
        action_name="learn",
        target=list(self._students.keys()),         # the whole group
        streams={"stdin":  [f"{prv_id}:images@teach_{cls}"],
                 "stdtar": [f"{prv_id}:labels@teach_{cls}"],
                 "stdext": []},
        num_steps=self.TEACH_PER_CLASS,
    )
    if not ok:
        self._current_lesson -= 1                   # roll back, retry next tick
        return False
    return True

What happens in order:

  1. Pick the next class. _current_lesson advances from -1 to 0, then 1, and so on; cls is the digit this lesson teaches. State lives on the instance, so the master remembers where it is in the curriculum across ticks (the same "keep state on self" idiom from Chapter 5).
  2. Address everyone. target=list(self._students.keys()) is the whole roster. A single send to a list of targets reaches them all, this is the "lead a whole group" move, not a loop.
  3. Point each student at the master's own data. The streams map wires the students' inputs to the master's streams: stdin to this teacher's images of group teach_{cls}, and stdtar (the supervision target) to its matching labels. The master built one (images, labels) stream pair per class up front in accept_new_role, so "teach class 3" is just naming teach_3. prv_id is the master's own in-world peer id, the streams belong to it.
  4. Bound the lesson. num_steps=TEACH_PER_CLASS tells every student how many samples to train on before the request is considered finished.
  5. Fail safe. If the send could not go out, the master un-does the lesson counter and returns False, so the behavior retries the very same class next tick instead of skipping it.

The send returns immediately; it does not wait for the students to finish. The waiting is the next state's job. The behavior leaves init_lesson and parks on a completion guard, the built-in all_sent_completed(action_name="learn"), which returns True only once every student has reported its learn request done. That is the fan-out-then-wait shape: one send, then one guard that gates the group.

Examine cumulatively

After a lesson the brainstorm state flips to the exam branch and send_exam runs. It is the same fan-out, but now the master also has to collect answers back:

class_incremental_learning/src/teacher.py
@action
async def send_exam(self):
    if self._current_exam is not None:
        return True
    students = list(self._students.keys())
    seen = self._classes_taught_so_far()            # e.g. [0, 1, 2]
    if not students or not seen:
        return False

    self._current_exam = {
        "id": f"exam_{int(datetime.datetime.now().timestamp())}",
        "seen_classes": seen,
        "predictions_by_student": {s: [] for s in students},
        "scores": {},
    }
    _, prv_id = self.get_peer_ids()
    ok = await self.send(
        action_name="process",
        target=students,
        streams={"stdin": [f"{prv_id}:images@eval"], "stdtar": [], "stdext": []},
        num_steps=self.EVAL_PER_CLASS * len(seen),  # all classes seen so far
        timeout=self.EXAM_TIMEOUT_SEC,              # per-student deadline
    )
    if not ok:
        self._current_exam = None
        return False
    return True

Reading it as a round:

  1. Open an exam record. _current_exam is a scratchpad for this exam: the classes being tested, an empty prediction list per student, and a place for scores. Guarding on _current_exam is not None makes the action idempotent, if the behavior re-enters the state, it does not start a second exam.
  2. Test everything taught so far. seen is [0..current_lesson]. The exam streams the combined eval set (all classes seen, in order) to every student and asks them to process it, run their model and emit a prediction per sample. num_steps is sized to cover every class seen, which is what makes forgetting visible: a student that learned class 0 but later overwrote it will now get class 0 wrong.
  3. Give each student its own deadline. timeout=EXAM_TIMEOUT_SEC is per-student, so one slow or dead learner auto-completes after the deadline instead of stalling the whole exam. The group guard fires on partial results and the master scores whatever arrived.

Predictions do not come back all at once, they trickle in as each student works. The master drains them with the built-in received_some_asked_data(processing_fcn="collect_predictions"), which calls one method of the master's once per sample received:

class_incremental_learning/src/teacher.py
def collect_predictions(self, agent, props, data, data_tag):
    if self._current_exam is None or agent not in self._current_exam["predictions_by_student"]:
        return
    prediction = int(data.item()) if isinstance(data, torch.Tensor) else int(data)
    self._current_exam["predictions_by_student"][agent].append(prediction)

collect_predictions is a plain method, not an action, the guard invokes it as fcn(agent, props, data, data_tag) for each incoming sample. agent is the student that sent it, data is its predicted class index (the student's processor already applied argmax, so it arrives as a bare class index, not raw logits). The master just files each prediction under the right student. The behavior stays on this state, looping the guard, until all_sent_completed(action_name="process") confirms the fan-out is over; then score_exam runs _compute_accuracy per student and prints the table, green where a class is remembered, red where it is forgotten. That table is the whole point: a continual-learning result you can see.

Fan-out, then wait

The pattern to take away: one send to a list of targets, then a completion guard (all_sent_completed to wait for the group, or received_some_asked_data while results stream in and you process each one). The master never loops over students and never holds a list of futures; it sends once, parks on a guard, and the guard tells it when the whole group is done. The guards themselves are catalogued in the action catalogue.

Coordinating peers: social_learning

social_learning is the most advanced teaching world, and it shows the master's two remaining tools: callbacks and pub/sub peer teaching. Its behavior (teacher.json) runs a full round per lecture: teach the group, exam it, find the best student, and then, instead of teaching the next round itself, hand the teaching over to that best student. The states that matter here are begin_teaching to exam_time to compare_time to best_found to best_teaching.

Picking who leads the round

The master does not nominate a best student by hand, it scores the class and lets the result choose. After the exam (exam_time sends process with wait_completion=True and parks until the group is done), the compare_time state runs the built-in compare_eval(cmp="min", thres=0.5). That action looks at the exam errors the master collected and keeps only the agents that pass the threshold, leaving them in self._valid_cmp_agents. In this world the comparison is tuned to yield a single winner, so _valid_cmp_agents ends up holding exactly one peer id: the best student. If nobody clears the bar, compare_time falls through to best_not_found and the round skips the social phase entirely. Selection is just scoring plus a threshold, no manual choice.

Two requests, back to back

With a winner in hand, the best_found state runs social_round. This is the heart of the chapter: the master sets up the best student to teach the others, peer to peer, then steps back and waits. Walk it in order:

social_learning/src/teacher.py (social_round, trimmed)
await self.find_agents("student")
all_students          = copy.deepcopy(self._engaged_agents)
not_isolated_students = copy.deepcopy(self._found_agents)
teacher               = self.get_peer_id()
best_student          = next(iter(self._valid_cmp_agents))     # the single winner
other_not_isolated    = not_isolated_students - {best_student}

# 1) Engage exactly the audience for this round
await self.set_engaged_partner(other_not_isolated)

# 2) Find the best student's pub/sub stream and subscribe the others to it
streams = self.find_streams(best_student, "best_student_stream")
best_stream_hash = next(iter(streams), None)
await self.send_subscribe(stream_hashes=[best_stream_hash])

# 3) Ask the OTHER students to learn from the best student's stream
learn_interaction = await self._send(
    action_name="learn_from_student", target=list(other_not_isolated),
    streams={"stdin": [f"{best_student}:images"], "stdtar": [f"{best_student}:labels"]},
    num_steps=self.get_unlabeled_steps(), callback="on_asked_done")
self._pending_asks.update(learn_interaction.target)
relay_uuid = learn_interaction.uuid

# 4) Ask the BEST student to label the master's unlabeled data, publishing under that uuid
gen_interaction = await self._send(
    action_name="teach", target=best_student,
    action_kwargs={"relay_uuid": relay_uuid},
    streams={"stdin": [f"{teacher}:images@unlabeled"]},
    num_steps=self.get_unlabeled_steps(), callback="on_asked_done")
self._pending_asks.update(gen_interaction.target)
  1. Sort the cast for this round. find_agents("student") refreshes the discovered set. _engaged_agents is everyone in the room; _found_agents is the non-isolated students (isolated ones never join peer teaching). The master subtracts the best student to get the audience, other_not_isolated, the students who will learn this round.
  2. Engage just that audience. set_engaged_partner(other_not_isolated) narrows the master's active partners to exactly the learners for this round, so the two sends that follow act on the right set and the gossip mesh links the right peers.
  3. Wire the pub/sub link. The best student owns a best_student_stream it only fills when nominated. find_streams(best_student, "best_student_stream") returns a {net-hash: stream} dict; that net-hash is the stream's unambiguous global address. send_subscribe(stream_hashes=[best_stream_hash]) then tells the engaged audience to subscribe to it, so the best student's output reaches them over a standing channel, no per-sample request, no direct link (Chapter 7).
  4. Ask the audience to learn (request A). learn_from_student points each listener's stdin/stdtar at the best student's images and labels and runs for get_unlabeled_steps() steps. It is a dedicated action name (a thin wrapper over the built-in learn) so that the next round's ordinary learn cannot accidentally re-trigger this one. The send returns an Interaction; the master stashes its targets in _pending_asks and grabs its uuid as relay_uuid.
  5. Ask the best student to teach (request B). teach streams the master's own unlabeled images into the best student, which runs its model and re-publishes each image plus its predicted label into best_student_stream, under relay_uuid, the same id the audience is reading from. That shared uuid is what lines the two requests up: the labels the best student publishes land exactly where the audience's learn_from_student is listening.

Two _send calls, two Interactions, two sets of targets dropped into _pending_asks. The master has now set the whole round in motion without doing any teaching itself.

Waiting on callbacks

The master must not start the next lecture until both requests above have finished. It does that with callbacks, not a wait loop. Each _send named callback="on_asked_done"; the interaction manager fires that callback once per interaction, when all of that interaction's targets have reported (whether they finished cleanly, timed out, or dropped):

social_learning/src/teacher.py
@action
async def on_asked_done(self, interaction=None) -> bool:
    if interaction is not None:
        self._pending_asks.difference_update(interaction.target)   # cross these off
    return True

@action
async def all_asks_done(self) -> bool:
    return len(self._pending_asks) == 0                            # the guard

on_asked_done removes that interaction's targets from _pending_asks; the best_teaching state loops on all_asks_done(), which is True only once _pending_asks is empty, that is, once both the audience's learning and the best student's teaching have completed. Only then does the behavior move on to the next lecture. The two callbacks plus one guard replace any manual joining of the two asynchronous requests.

No agent changed role here, the best student is simply asked to run a different action this round. Learning spreads sideways, between peers, with the master only selecting the winner, wiring the channel, issuing two requests, and waiting on a guard.

graph TD
    M[Master] -->|compare_eval picks best| Best[Best student]
    M -->|send_subscribe others to best| Sub[(pub/sub link)]
    M -->|teach + relay_uuid| Best
    M -->|learn_from_student| O[Other students]
    Best -->|labels via pub/sub| O
    O -->|callback: on_asked_done| M
    Best -->|callback: on_asked_done| M

Changing roles mid-session

In social_learning the best student takes over the teaching without changing role, the master just asks it to run a different action (teach) for one round. That is the lighter-weight option, and often the right one: if the change is only "do this other thing now", a different send is enough.

When a master does want to change who is what, it uses set_role (or applies an agent's suggest_role_to_world). Promotion in animal_school (Chapter 8) is the simplest case; in larger worlds a master can reshape the whole cast between rounds. The new role's behavior is shipped automatically, the agent just starts running it. The distinction is worth keeping: ask for a one-off action when the change is momentary, change the role when the agent should permanently behave differently.

Rewarding the round's winner

Leading a session is not only sending work out; the master also records and rewards what happened. After the social round, social_learning's manage_best_of_class turns the best student's exam result into a stat and a badge:

social_learning/src/teacher.py (trimmed)
@action
async def manage_best_of_class(self):
    if len(self._valid_cmp_agents) > 0:
        best_student_peer_id = next(iter(self._valid_cmp_agents))     # same winner
        best_student_result  = self._eval_results[best_student_peer_id]
        if best_student_result >= 0:
            self.stats.store_stat("best_exam_err_history", best_student_result,
                                  group_key=best_student_peer_id, timestamp=...)
            return await super().suggest_badges_to_world(
                agent=best_student_peer_id, score=best_student_result,
                badge_type="intermediate",
                badge_description="Best student of a class, MNIST ...")
    return True

It reuses the same _valid_cmp_agents winner the social round used, reads that student's score from _eval_results (filled by the master's evaluate action), stores a per-student stat keyed by peer id, and then suggests a badge to the world. The master proposes; the world awards. Scoring, stat-keeping, and badges are how a session leaves a durable record, covered in full in Chapter 10.

When one master is not enough: tiers

Everything above assumes a single master leading every peer. That works up to a point, but a world with hundreds of participants, or several independent activities at once, needs more than one leader. Nothing limits a world to one master, so you give it a small hierarchy of leading roles.

The flagship turing world does exactly this. Its assign_role reads a list of manager names from a file and hands out three tiers:

# turing/src/world.py (shape)
if unaid in self.hotel_managers:  return "hotel_manager"   # top: load-balances guests onto floors
if unaid in self.floor_managers:  return "floor_manager"   # mid: runs one floor's activity
return "guest"                                              # everyone else

The pattern is general:

  • Several leading roles, not one. Each is just a role with its own behavior; a world can define hotel_manager, floor_manager, and more.
  • Leaders come from a list. turing keeps the manager names in a managers.txt it can reload while the world runs, so you add or remove leaders without restarting.
  • Each tier owns a slice. The top leader load-balances newcomers onto mid leaders; each mid leader then runs the single-master loop from this chapter, scoped to its own slice.

Reach for tiers when one master would be a bottleneck: many simultaneous participants, or independent sub-activities that should run in parallel.

The master's toolkit, in one place

Move How
Know who is in the room get_agents_by_role / find_agents, refreshed on on_tick
Engage the right peers send_engage / set_engaged_partner
Lead a whole group one send with target=[many]
Wait for the group all_sent_completed, received_some_asked_data
React on completion callback= + a guard like all_asks_done()
Pick who leads next evaluate then compare_eval (fills _valid_cmp_agents)
Teach peer to peer subscribe / send_subscribe + a shared relay_uuid
Reshape the cast set_role / suggest_role_to_world
Score and reward evaluate, compare_eval, store_stat, suggest_badges_to_world

That is the whole of leading a session. There is no pipeline and no central script: just a master sending, waiting, and adjusting, while a society of agents acts and learns. For the concept-level reference behind this chapter, see Learning and teaching and Roles and the world master.

Where next