k-Armed Bandit 1.0.0
A collection of k-armed bandits and assoicated agents for reinforcement learning
|
An agent that always exploits, never explores. More...
Public Member Functions | |
None | __init__ (self, int k, float start_value=0.0) |
Construct the agent. | |
int | act (self) |
Select an action to take from the available ones. | |
None | update (self, int action, float reward) |
Update the table values based on the last action. | |
Public Member Functions inherited from agent.base_agent.BaseAgent | |
int | exploit (self) |
Select the best action. | |
int | explore (self) |
Explore a new action. | |
numpy.ndarray | table (self) |
Return the Q-Table. | |
Protected Attributes | |
_n | |
Protected Attributes inherited from agent.base_agent.BaseAgent | |
_table | |
An agent that always exploits, never explores.
It will always pick the action with the highest value from the Q-table. While these values will be updated, it never explores, so will likely quickly converge on a single action.
None agent.greedy.Greedy.__init__ | ( | self, | |
int | k, | ||
float | start_value = 0.0 |
||
) |
Construct the agent.
k | The number of arms to select from. Should be an int greater than zero. |
start_value | The starting reward to use for each arm. All arms assume the same value at the start. |
Reimplemented from agent.base_agent.BaseAgent.
int agent.greedy.Greedy.act | ( | self | ) |
Select an action to take from the available ones.
Greedy always exploits, so this will always be one of the actions with the highest table value.
Reimplemented from agent.base_agent.BaseAgent.
None agent.greedy.Greedy.update | ( | self, | |
int | action, | ||
float | reward | ||
) |
Update the table values based on the last action.
This uses an iterative version of a running average to update table values.
action | The index corresponding to the action that was taken. |
reward | The resulting reward that was earned. |
Reimplemented from agent.base_agent.BaseAgent.