|
k-Armed Bandit 1.0.0
A collection of k-armed bandits and assoicated agents for reinforcement learning
|
An agent that always exploits, never explores. More...
Public Member Functions | |
| None | __init__ (self, int k, float start_value=0.0) |
| Construct the agent. | |
| int | act (self) |
| Select an action to take from the available ones. | |
| None | update (self, int action, float reward) |
| Update the table values based on the last action. | |
Public Member Functions inherited from agent.base_agent.BaseAgent | |
| int | exploit (self) |
| Select the best action. | |
| int | explore (self) |
| Explore a new action. | |
| numpy.ndarray | table (self) |
| Return the Q-Table. | |
Protected Attributes | |
| _n | |
Protected Attributes inherited from agent.base_agent.BaseAgent | |
| _table | |
An agent that always exploits, never explores.
It will always pick the action with the highest value from the Q-table. While these values will be updated, it never explores, so will likely quickly converge on a single action.
| None agent.greedy.Greedy.__init__ | ( | self, | |
| int | k, | ||
| float | start_value = 0.0 |
||
| ) |
Construct the agent.
| k | The number of arms to select from. Should be an int greater than zero. |
| start_value | The starting reward to use for each arm. All arms assume the same value at the start. |
Reimplemented from agent.base_agent.BaseAgent.
| int agent.greedy.Greedy.act | ( | self | ) |
Select an action to take from the available ones.
Greedy always exploits, so this will always be one of the actions with the highest table value.
Reimplemented from agent.base_agent.BaseAgent.
| None agent.greedy.Greedy.update | ( | self, | |
| int | action, | ||
| float | reward | ||
| ) |
Update the table values based on the last action.
This uses an iterative version of a running average to update table values.
| action | The index corresponding to the action that was taken. |
| reward | The resulting reward that was earned. |
Reimplemented from agent.base_agent.BaseAgent.