Skip to content

4 · Behaviors as state machines

Build worlds · Chapter 4 of 12 · Path home

A role is a name; a behavior is what that name actually does, step by step. In UNaIVERSE a behavior is a hybrid state machine (HSM): a small graph of states connected by transitions, where each transition runs an action. You build one per role inside create_behav_files, and the world ships it to whoever takes that role.

Why a state machine

Because behavior over time is naturally "I am waiting, then I connect, then I am ready, then I respond, then back to ready." States and transitions capture exactly that, and the framework runs the loop for you so you never write an event loop by hand.

States

A state is a named condition the agent can be in. You add one with add_state:

behav.add_state("init", blocking=True)
behav.add_state("ready", action="check_messages",
                args={"max_silence_seconds": 25.0})

Read this as two declarations:

  1. The first line registers a state called init and marks it blocking. The first state you add also becomes the machine's initial state automatically, so init is where every agent that takes this role starts.
  2. The second registers a state called ready that carries an in-state action: while the agent sits in ready, the framework keeps calling check_messages (with the args you gave) on every clock tick. The state action is the "what I do while I wait here", distinct from the transition actions that move the agent out of the state.

Three things are happening, and each matters at runtime:

  • The name is yours (init, ready, waiting). It is only a label; nothing about the name is magic.
  • The optional in-state action (action=..., args=...) runs while the agent sits in that state. Here check_messages keeps running while the agent is ready, polling the room. The args dict is passed to it as keyword arguments, the same channel a transition uses (see Chapter 5).
  • blocking controls pacing, and it is the one flag people misread. A blocking state (blocking=True, the default) makes the act loop stop once the agent lands there: the machine runs its inner action and then yields, waiting for the next tick or event before it tries to move on. A non-blocking state (blocking=False) does the opposite, the machine flows straight through it in the same cycle, immediately attempting the next transition. You mix the two to get the rhythm you want: blocking where the agent should pause and react to the outside world, non-blocking for bookkeeping states it should pass through without pausing.

What blocking is not

blocking=True does not mean "freeze the agent" or "run synchronously". The agent's clock keeps ticking; a blocking state simply means the per-cycle transition loop ends there instead of cascading onward. A long synchronous call inside an action is what actually freezes an agent, see the "Do not block the tick" warning in Chapter 5.

Exactly one state is active at a time.

Transitions

A transition is a rule: from one state, run an action, and to another state if that action succeeds. You add one with add_transit:

behav.add_transit("init", "waiting_handshake",
                  action="connect_to_broadcaster", args={"role": "broadcaster"},
                  msg="🔗 Connecting to the room...")

Read the call argument by argument:

  1. "init" is the from-state and "waiting_handshake" is the to-state. If either state has not been declared yet, add_transit creates it for you, so you can sketch a behavior purely in transitions and let the states appear implicitly.
  2. action="connect_to_broadcaster" names the method that guards the edge. On each tick while the agent is in init, the framework calls it.
  3. args={"role": "broadcaster"} is passed to that method as keyword arguments.
  4. The action's return value decides the edge: if connect_to_broadcaster returns True, the transition fires and the agent moves to waiting_handshake; if it returns False, nothing moves and the action is simply retried next tick (this is the success/retry contract from Chapter 5).
  5. msg="🔗 Connecting to the room..." is an optional human-readable label printed when the transition fires, purely for following along in the logs.

The takeaway: a transition is "from here, keep calling this action until it succeeds, then go there." You never write the polling loop; returning False from the action is the wait.

Two flags shape when a transition is eligible, and both are worth getting exactly right because they decide who drives the edge:

  • ready (default True) sets the action's inner-readiness. A ready=True action is inner-ready: the policy may dispatch it autonomously, on the agent's own initiative, every tick. A transition with ready=False is outer-only: it is never fired proactively, it sits dormant until an external interaction request arrives for it, and only then does it run, serving that request. This is how an agent exposes a capability for others to invoke rather than something it does on its own. The broadcaster's relay loop is exactly this: a single ready=False self-loop that only acts when a user pushes a message to be broadcast.
  • avoid_changing_ready: keep the readiness state as-is across this transition. By default, an action whose signature declares an interaction/requester parameter is automatically forced to outer-only (because such actions are meant to answer requests). Passing avoid_changing_ready=True suppresses that override, you use it when an action stages work for the next step and you want it to stay proactively dispatchable rather than wait for a request.

A state can have several outgoing transitions. Which one fires is decided by the policy.

The policy

When more than one transition out of the current state could fire, a policy picks exactly one. The default policy walks the feasible actions in a fixed order and returns the first match:

  1. High-priority actions first. Any transition flagged high_priority=True wins over everything else (rarely needed; off by default).
  2. Then actions with a pending request. If an outer-only (ready=False) transition has an external interaction waiting, it is served next, oldest request first. This is why an agent answers what it was asked before doing its own thing.
  3. Then the first inner-ready transition. Among the remaining ready=True transitions, the first one (in the order you added them) that is currently ready fires.

So ordering matters: when a state has several proactive transitions, the one you add_transit first is tried first. You rarely need to change any of this; when you do, you pass your own policy to the HSM or set a policy_filter on the agent. For everything in this path, the default is what you want.

Pacing actions with a policy_filter

A policy_filter is a callable on the Agent (the policy_filter constructor argument) that can veto or delay the action the policy just chose. The most common one ships with UNaIVERSE, PolicyFilterDelayAction, which enforces a minimum wait before a named action fires. The flagship turing world uses it so its chat bots feel human and never reply faster than about a second:

from unaiverse.utils.misc import PolicyFilterDelayAction

agent = Agent(proc=my_model, proc_inputs=["text"], proc_outputs=["text"],
              policy_filter=PolicyFilterDelayAction({"process"}, wait=1.,
                                                    add_random_up_to=1.))

{"process"} is the set of action names to pace; wait is the minimum seconds before each fires again; add_random_up_to adds up to that many seconds of random jitter. Unlike a transition's per-edge delay (which belongs to one edge), a policy_filter is cross-cutting: it applies to an action by name wherever it appears.

Teleports: rules that apply from anywhere

Sometimes you need "no matter what state I'm in, if X happens, go to Y", a timeout, a disconnect, a reset. That is a teleport: a transition that can fire from any state.

# If the partner vanished, no matter where we are, go home and wait.
behav.add_global_teleport("init", action="disconnected", args={"delay": 5.0})

Notice the argument order: the first argument is the destination (add_global_teleport(to_state, action, ...)), because there is no single source, the call wires the same transition from every state currently registered, all at once. So this one line means: "from anywhere, if disconnected returns True (waiting at least delay=5.0 seconds first), go to init." Under the hood a teleport is an ordinary transition flagged teleport=True; it is only hidden in the diagram to keep the visual graph readable, but it is selected by the policy and runs exactly like any other edge. If you want a teleport out of one specific state instead of all of them, pass teleport=True to a normal add_transit.

This is how the teaching worlds recover when a student drops mid-lesson: a single global teleport sends agents back to init on disconnected from anywhere, so you do not have to wire a "what if they leave" edge onto every state by hand.

The chat world spells it out per state instead

A global teleport is the concise way to express "from anywhere, on X, go to Y". The chat world you will read below makes the same recovery explicit with two ordinary transitions, waiting_handshake to init and ready to init, both on disconnected, rather than one teleport. Both styles are correct; the teleport just collapses the repetition.

Timing controls

Transitions accept three timing arguments so you can pace and bound them. Each is in seconds, each defaults to 0. (meaning "no constraint"), and each does something different:

  • delay: a minimum dwell time. The transition is not even considered by the policy until at least this many seconds have elapsed since the agent entered the source state. Use it to hold an agent in a state for a beat, for example, waiting 5.0 seconds before treating a partner as truly disconnected.
  • timeout: a patience limit on retrying. If the action keeps returning False, the machine keeps retrying it tick after tick, until timeout seconds have accumulated, at which point it gives up on this transition. Use it so a stuck handshake does not wait forever.
  • total_time: a hard wall-clock budget for the whole transition once it starts, useful for bounding a multi-step action.
behav.add_transit("waiting", "ready", action="connected",
                  args={}, timeout=30.0, delay=1.0)

Read this as: in waiting, wait at least 1 second before even trying connected; then retry it until it returns True, but if 30 seconds pass without success, abandon the edge. So delay gates the start, timeout bounds the retrying, and total_time caps the total run. These three names are consumed by the transition itself, they are stripped before your action sees them, so do not add parameters of the same name to your method.

Wildcards: one behavior, many agents

A behavior file is a template. Placeholders written in angle brackets, like <agent>, <world>, or <stream_name>, are filled in at runtime so the same file adapts to each agent and each session:

# In the teaching worlds, a teacher's playlist references the student's streams:
behav.add_transit("engagement_complete", "begin_teaching",
                  action="set_pref_streams", args={"net_hashes": ["<agent>:recorded1"]})

When you write this transition you do not yet know which student the teacher will pair with, so you cannot hard-code a peer. Instead you leave the placeholder <agent> inside the argument. The transition is saved with the placeholder intact; at runtime, once the teacher has actually engaged a student, the machine substitutes the real partner for <agent>, so args becomes the concrete "<real-peer>:recorded1" before the action runs. The behavior file ships generic and specialises itself per session.

You fill placeholders with the HSM's set_wildcards({...}) / update_wildcard(...), or, on an agent, add_behav_wildcard("<stream_name>", "animal_stream"). This is what lets one teacher.json teach any student, and one user.json connect to whichever broadcaster happens to be present.

Reusing templates

You do not have to build every behavior from scratch. Common patterns ship as reusable behavior JSON you load and fill with wildcards:

  • engage_by_role: find and engage agents of a given role.
  • service_requester / service_provider: the ask/answer pair the info_extraction world uses.
  • listening_to_teacher: the student side of the teaching worlds.

You snap these together and override the parts that differ, which is why a teaching world's create_behav_files is short despite a rich behavior.

Saving, and the world ships it

Each behavior ends with a save, writing the <role>.json the world hands out to whoever takes that role:

behav.save(os.path.join(self.world_folder, "user.json"), only_if_changed=dummy_agent)

save serialises the whole machine, states, transitions, wildcards, the welcome message, into one JSON file. The only_if_changed=dummy_agent argument is a small convenience: it rewrites the file only if the behavior actually differs from what is already on disk (checked against the dummy agent the machine was built with), so re-running create_behav_files does not churn unchanged files.

Read one for real: the chat user

Putting it together, here is the user role from the chat world, the whole loop a chat member runs, taken verbatim from create_behav_files:

chat/src/world.py (the user behavior)
dummy_agent = UserAgent(proc=None)
behav = HybridStateMachine(dummy_agent)
behav.set_welcome_message(welcome_msg)
behav.set_role("user")

behav.add_state("init", blocking=True)
behav.add_state("waiting_handshake", blocking=False)
behav.add_state("message_sent", blocking=False)
behav.add_state("ready", action="check_messages",
                args={"max_silence_seconds": 25.0, "talk_probability": 0.01, "history_len": 3},
                msg="👍 Ready!")

behav.add_transit("init", "waiting_handshake",
                  action="connect_to_broadcaster", args={"role": "broadcaster"},
                  msg="🔗 Connecting to the room...")
behav.add_transit("waiting_handshake", "ready",
                  action="connected", args={"handshake_completed": True})
behav.add_transit("waiting_handshake", "init", action="disconnected", args={"delay": 5.0})
behav.add_transit("ready", "init", action="disconnected", args={"delay": 5.0})
behav.add_transit("ready", "message_sent",
                  action="generate_and_send", args={"samples": 1},
                  ready=True, avoid_changing_ready=True)
behav.add_transit("message_sent", "ready", action="nop")

behav.save(os.path.join(self.world_folder, "user.json"), only_if_changed=dummy_agent)

The first three lines set up the machine: a HybridStateMachine is built around a dummy UserAgent (created with proc=None, no model, no network) whose only job is to let the machine verify at build time that every action name below really exists on the user class. set_welcome_message is the greeting logged the first time the agent reaches its initial state; set_role("user") stamps the file. Now walk the four states in the order a chat member lives them.

init (blocking). The starting state, the first one added, so the machine begins here. It is blocking, so the agent pauses here and reacts rather than racing onward. Its single outgoing transition runs connect_to_broadcaster(role="broadcaster"), which locates the local processor stream and dials the room's broadcaster. While that is still in progress the action returns False and is retried; when the connection is made it returns True and the agent advances to waiting_handshake, logging "🔗 Connecting to the room...".

waiting_handshake (non-blocking). A short staging state with two outgoing edges, and this is where ordering and the policy meet:

  1. connected(handshake_completed=True) to ready. This is the happy path: it returns True once the connection has fully settled, sending the agent to ready.
  2. disconnected(delay=5.0) to init. The recovery path: delay=5.0 means it is not even considered for the first 5 seconds, and if the partner has actually dropped it sends the agent back to init to start over.

Both are proactive (ready=True), so the policy takes the first ready one, connected is added first and wins whenever the handshake is succeeding; only if it keeps failing does the disconnected edge get its chance. Because the state is non-blocking, the agent does not linger here once an edge fires.

ready (in-state action check_messages). The heart of the loop, and the only state with a long-running in-state action: while the agent sits here, the framework calls check_messages every tick with max_silence_seconds=25.0, talk_probability=0.01, history_len=3. That action polls the broadcaster's stream and the agent's own messages, keeps a short rolling history, and decides whether there is anything worth saying, if so it stages a prompt in stdin for the next step. The msg="👍 Ready!" is printed once on entry. ready has three outgoing edges:

  1. generate_and_send(samples=1) to message_sent. The proactive reply path. Note its two flags: ready=True keeps it inner-ready (the agent fires it on its own), and avoid_changing_ready=True prevents the automatic "force outer-only" override, so the agent keeps the initiative to speak rather than waiting to be asked. When check_messages has staged something, this action runs the model on it and forwards the result to the broadcaster.
  2. disconnected(delay=5.0) to init. The same recovery edge as before: if the room vanishes, go home and reconnect.
  3. (The check_messages state action keeps running underneath all of this, it is not a transition, it is the work the agent does while in ready.)

message_sent (non-blocking) to ready via nop. After a message goes out the agent passes through message_sent and immediately runs nop, a built-in no-op that simply returns True, to loop straight back to ready. Because message_sent is non-blocking, this is a clean "reset the cycle" hop with no pause, leaving the agent back in ready, watching the room again.

So the life of a chat member is: init dials the broadcaster, waiting_handshake confirms the connection (or bails to init), ready watches the room and, when moved to speak, runs generate_and_send, then message_sent to ready closes the loop, with a disconnected escape hatch on the two live states. In Chapter 7 we follow the actual data through this exact loop end to end.

Where next