This package is discontinued. Please check ReinforcementLearning.jl, POMDPs.jl or AlphaZero.jl instead.
Reinforce.jl is an interface for Reinforcement Learning. It is intended to connect modular environments, policies, and solvers with a simple interface.
Packages which build on Reinforce:
- AtariAlgos: Environment which wraps Atari games using ArcadeLearningEnvironment
- OpenAIGym: Wrapper for OpenAI's python package: gym
New environments are created by subtyping AbstractEnvironment and implementing
a few methods:
reset!(env) -> envactions(env, s) -> Astep!(env, s, a) -> (r, s′)finished(env, s′) -> Bool
and optional overrides:
state(env) -> sreward(env) -> r
which map to env.state and env.reward respectively when unset.
ismdp(env) -> Bool
An environment may be fully observable (MDP) or partially observable (POMDP).
In the case of a partially observable environment, the state s is really
an observation o. To maintain consistency, we call everything a state,
and assume that an environment is free to maintain additional (unobserved)
internal state. The ismdp query returns true when the environment is MDP,
and false otherwise.
maxsteps(env) -> Int
The terminating condition of an episode is control by
maxsteps() || finished().
It's default value is 0, indicates unlimited.
An minimal example for testing purpose is test/foo.jl.
TODO: more details and examples
Agents/policies are created by subtyping AbstractPolicy and implementing action.
The built-in random policy is a short example:
struct RandomPolicy <: AbstractPolicy end
action(π::RandomPolicy, r, s, A) = rand(A)Where A is the action space.
The action method maps the last reward and current state to the next chosen action:
(r, s) -> a.
reset!(π::AbstractPolicy) -> π
Iterate through episodes using the Episode iterator.
A 4-tuple (s,a,r,s′) is returned from each step of the episode:
ep = Episode(env, π)
for (s, a, r, s′) in ep
# do some custom processing of the sars-tuple
end
R = ep.total_reward
T = ep.niterThere is also a convenience method run_episode.
The following is an equivalent method to the last example:
R = run_episode(env, π) do
# anything you want... this section is called after each step
end
