k-Armed Bandit 1.0.0
A collection of k-armed bandits and assoicated agents for reinforcement learning
|
A greedy agent that occasionally explores. More...
Public Member Functions | |
None | __init__ (self, int k, float epsilon, float start_value=0.0) |
Construct the agent. | |
int | act (self) |
Determine which action to take. | |
float | epsilon (self) |
None | epsilon (self, float value) |
None | update (self, int action, float reward) |
Update the Q-table based on the last action. | |
Public Member Functions inherited from agent.base_agent.BaseAgent | |
int | exploit (self) |
Select the best action. | |
int | explore (self) |
Explore a new action. | |
numpy.ndarray | table (self) |
Return the Q-Table. | |
Public Attributes | |
epsilon | |
Protected Attributes | |
_n | |
_rng | |
_epsilon | |
Protected Attributes inherited from agent.base_agent.BaseAgent | |
_table | |
A greedy agent that occasionally explores.
This agent will primarily exploit when deciding its actions. However, it will occasionally choose to explore at a rate of epsilon, which is provided at initialization. This gives it a chance to see if other actions are better options.
Definition at line 5 of file epsilon_greedy.py.
None agent.epsilon_greedy.EpsilonGreedy.__init__ | ( | self, | |
int | k, | ||
float | epsilon, | ||
float | start_value = 0.0 |
||
) |
Construct the agent.
k | The number of actions to consider. This must be an int greater than zero. |
epsilon | The rate at which actions should randomly explore. As this is a probability, it should be between 0 and 1. |
start_value | The initial value to use in the table. All actions start with the same value. |
ValueError | if epsilon is not a valid probability (between 0 and 1). |
Reimplemented from agent.base_agent.BaseAgent.
Definition at line 14 of file epsilon_greedy.py.
int agent.epsilon_greedy.EpsilonGreedy.act | ( | self | ) |
Determine which action to take.
This will explore randomly over the actions at a rate of epsilon and inversely will exploit based on table values at a rate of (1.0 - epsilon).
Reimplemented from agent.base_agent.BaseAgent.
Definition at line 31 of file epsilon_greedy.py.
float agent.epsilon_greedy.EpsilonGreedy.epsilon | ( | self | ) |
Definition at line 49 of file epsilon_greedy.py.
None agent.epsilon_greedy.EpsilonGreedy.epsilon | ( | self, | |
float | value | ||
) |
Definition at line 53 of file epsilon_greedy.py.
None agent.epsilon_greedy.EpsilonGreedy.update | ( | self, | |
int | action, | ||
float | reward | ||
) |
Update the Q-table based on the last action.
This will use an incremental formulation of the mean of all rewards obtained so far as the values of the table.
action | An index representing which action on the table was selected. It must be between [0, k). |
reward | The reward obtained from this action. |
Reimplemented from agent.base_agent.BaseAgent.
Definition at line 59 of file epsilon_greedy.py.
|
protected |
Definition at line 57 of file epsilon_greedy.py.
|
protected |
Definition at line 27 of file epsilon_greedy.py.
|
protected |
Definition at line 29 of file epsilon_greedy.py.
agent.epsilon_greedy.EpsilonGreedy.epsilon |
Definition at line 25 of file epsilon_greedy.py.