Env: The agent is acting in an environment
State & Actions: The agent can stay in one of many states (s) of the environment, and choose to take one of many actions (a) to switch from one state to another.
Model: How the environment reacts to certain actions is defined by a model which we may or may not know. The model defines the reward function and transition probabilities
State Transition (Model): Which state the agent will arrive in is decided by transition probabilities between states (P).
Reward: Once an action is taken, the environment delivers a reward (r) as feedback.

Image Removed

Model Based: Know the model: planning with perfect information; do model-based RL. When we fully know the environment, we can find the optimal solution by Dynamic Programming (DP).
Model-Free: learning with incomplete information; do model-free RL or try to learn the model explicitly as part of the algorithm

Image Removed

Policy: Agent’s policy π(s) provides the guideline on what is the optimal action to take in a certain state with the goal to maximize the total rewards
Value Function: Each state is associated with a value function V(s) predicting the expected amount of future rewards we are able to receive in this state by acting the corresponding policy.

Image RemovedImage Removed

we are trying to learn the policy and value functions in RL

On-policy
1. Agent can pick actions
2. Agent always follows his/her own policy
3. most obvious setup
Off-policy: Training on a distribution of transitions or episodes produced by a different behavior policy rather than that produced by the target policy.
1. Agent can't pick actions
2. Learning with exploration, playing without exploration
3. Learning from expert (expert is imperfect)
4. Learning from sessions (recorded data)

Image Removed

...

Git Repo

Our main goal for this project is to attempt to replace traditional path planning techniques or joystick operations with an optimal control policy that is learnt via Reinforcement Learning

Image Added

To Do:

Arm: Test arm’s IK with closed loop feedback
Robo-gym:Test RL Algorithm with real hardware
Pose Estimation: Experiments with fixed keyboard position …
Robohub (Brendan DeHart): Can always use the Gen3 arm for testing

Resources

AlphaGo Documentary: for motivation ..
Sutton & Barto GOAT Textbook: from the GOAT University of Alberta (uni for my undergrad )
CS285: Berkeley’s Deep Reinforcement Learning: really good online course for RL
OpenAI Gym Tutorial: this is how we got started with our OpenAI gym
Stable -Baselines3: we use standard RL algorithms from this repo
Robo-Gym: framework for training in Gazebo + ROS
CS231n: Stanford’s Convolutional Neural Networks for Visual Recognition: really good online course for learning about NNs or CNNs

Versions Compared

Old Version 1

New Version Current

Key

Git Repo

To Do:

Resources

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Git Repo

To Do:

Resources