Env: The agent is acting in an environment
- State & Actions: The agent can stay in one of many states (s) of the environment, and choose to take one of many actions (a) to switch from one state to another.
- Model: How the environment reacts to certain actions is defined by a model which we may or may not know. The model defines the reward function and transition probabilities
- State Transition (Model): Which state the agent will arrive in is decided by transition probabilities between states (P).
- Reward: Once an action is taken, the environment delivers a reward (r) as feedback.
- Model Based: Know the model: planning with perfect information; do model-based RL. When we fully know the environment, we can find the optimal solution by Dynamic Programming (DP).
- Model-Free: learning with incomplete information; do model-free RL or try to learn the model explicitly as part of the algorithm
- Policy: Agent’s policy π(s) provides the guideline on what is the optimal action to take in a certain state with the goal to maximize the total rewards
- Value Function: Each state is associated with a value function V(s) predicting the expected amount of future rewards we are able to receive in this state by acting the corresponding policy.
we are trying to learn the policy and value functions in RL
- On-policy
- Agent can pick actions
- Agent always follows his/her own policy
- most obvious setup
- Off-policy: Training on a distribution of transitions or episodes produced by a different behavior policy rather than that produced by the target policy.
- Agent can't pick actions
- Learning with exploration, playing without exploration
- Learning from expert (expert is imperfect)
- Learning from sessions (recorded data)
...
Git Repo
Our main goal for this project is to attempt to replace traditional path planning techniques or joystick operations with an optimal control policy that is learnt via Reinforcement Learning
To Do:
- Arm: Test arm’s IK with closed loop feedback
- Robo-gym:Test RL Algorithm with real hardware
- Pose Estimation: Experiments with fixed keyboard position …
- Robohub (Brendan DeHart): Can always use the Gen3 arm for testing
Resources
- AlphaGo Documentary: for motivation ..
- Sutton & Barto GOAT Textbook: from the GOAT University of Alberta (uni for my undergrad )
- CS285: Berkeley’s Deep Reinforcement Learning: really good online course for RL
- OpenAI Gym Tutorial: this is how we got started with our OpenAI gym
- Stable -Baselines3: we use standard RL algorithms from this repo
- Robo-Gym: framework for training in Gazebo + ROS
- CS231n: Stanford’s Convolutional Neural Networks for Visual Recognition: really good online course for learning about NNs or CNNs