|
k-Armed Bandit 1.0.0
A collection of k-armed bandits and assoicated agents for reinforcement learning
|
A greedy agent that occasionally explores. More...
Public Member Functions | |
| None | __init__ (self, int k, float epsilon, float start_value=0.0) |
| Construct the agent. | |
| int | act (self) |
| Determine which action to take. | |
| float | epsilon (self) |
| None | epsilon (self, float value) |
| None | update (self, int action, float reward) |
| Update the Q-table based on the last action. | |
Public Member Functions inherited from agent.base_agent.BaseAgent | |
| int | exploit (self) |
| Select the best action. | |
| int | explore (self) |
| Explore a new action. | |
| numpy.ndarray | table (self) |
| Return the Q-Table. | |
Public Attributes | |
| epsilon | |
Protected Attributes | |
| _n | |
| _rng | |
| _epsilon | |
Protected Attributes inherited from agent.base_agent.BaseAgent | |
| _table | |
A greedy agent that occasionally explores.
This agent will primarily exploit when deciding its actions. However, it will occasionally choose to explore at a rate of epsilon, which is provided at initialization. This gives it a chance to see if other actions are better options.
Definition at line 5 of file epsilon_greedy.py.
| None agent.epsilon_greedy.EpsilonGreedy.__init__ | ( | self, | |
| int | k, | ||
| float | epsilon, | ||
| float | start_value = 0.0 |
||
| ) |
Construct the agent.
| k | The number of actions to consider. This must be an int greater than zero. |
| epsilon | The rate at which actions should randomly explore. As this is a probability, it should be between 0 and 1. |
| start_value | The initial value to use in the table. All actions start with the same value. |
| ValueError | if epsilon is not a valid probability (between 0 and 1). |
Reimplemented from agent.base_agent.BaseAgent.
Definition at line 14 of file epsilon_greedy.py.
| int agent.epsilon_greedy.EpsilonGreedy.act | ( | self | ) |
Determine which action to take.
This will explore randomly over the actions at a rate of epsilon and inversely will exploit based on table values at a rate of (1.0 - epsilon).
Reimplemented from agent.base_agent.BaseAgent.
Definition at line 31 of file epsilon_greedy.py.
| float agent.epsilon_greedy.EpsilonGreedy.epsilon | ( | self | ) |
Definition at line 49 of file epsilon_greedy.py.
| None agent.epsilon_greedy.EpsilonGreedy.epsilon | ( | self, | |
| float | value | ||
| ) |
Definition at line 53 of file epsilon_greedy.py.
| None agent.epsilon_greedy.EpsilonGreedy.update | ( | self, | |
| int | action, | ||
| float | reward | ||
| ) |
Update the Q-table based on the last action.
This will use an incremental formulation of the mean of all rewards obtained so far as the values of the table.
| action | An index representing which action on the table was selected. It must be between [0, k). |
| reward | The reward obtained from this action. |
Reimplemented from agent.base_agent.BaseAgent.
Definition at line 59 of file epsilon_greedy.py.
|
protected |
Definition at line 57 of file epsilon_greedy.py.
|
protected |
Definition at line 27 of file epsilon_greedy.py.
|
protected |
Definition at line 29 of file epsilon_greedy.py.
| agent.epsilon_greedy.EpsilonGreedy.epsilon |
Definition at line 25 of file epsilon_greedy.py.