👷Worker, a worker creates a task to run an experiment in background. It periodically sends out transitions between agent and environment, and fetches latest parameter. 📢WorkerProxy, a worker proxy collects messages from/to workers on the same node so that some message data (model params) can be shared accross different workers. 💿TrajectoryManager, a trajectory manager is a wrapper around an
AbstractTrajectory. It takes in a bulk of transitions and samples a batch of training data in respond to rquests.
💡Trainer, a trainer is a wrapper around an
AbstractPolicy, it does nothing but to update its internal parameters when received a batch of training data and periodically broadcast its latest parameters.
⏱️Orchestrator, an orchestrator is in charge of controlling the start, stop and the speed of communications between the above components.
- We adopt the actor model here. Each instance of the above components is an actor. Only messages are passing between them.
- A node is a process in Julia. Different nodes can be on one machine or accross different machines.
- Tasks in different workers are initiated with
Threads.@spawn. There's no direct communication between them by design.
- In single node environment (
MainNodeare the same one), the WorkerProxy can be removed and workers communicate with Orchestrator directly.
1️⃣( 👷→ 📢)
InsertTransitionMsg, contains the local transitions between agent and environment in an experiment.
2️⃣( 📢→ ⏱️)
InsertTransitionMsgfrom different workers.
3️⃣( ⏱️→ 💿)
SampleBatchMsg(which contains the address of Trainer).
4️⃣( 💿→ 💡)
5️⃣( 💡→ 💿)
UpdatePriorityMsg, only necessary in prioritized experience replay related algorithms.
6️⃣( 💡→ ⏱️)
LoadParamsMsg, contains the latest parameters of the policy.
7️⃣( ⏱️→ 📢)
8️⃣( 📢→ 👷)