k-Armed Bandit 1.0.0
A collection of k-armed bandits and assoicated agents for reinforcement learning
|
A base class used to create a variety of bandit solving agents. More...
Public Member Functions | |
None | __init__ (self, int k, float start_value=0.0) |
Construct the agent. | |
int | act (self) |
Use a specific algorithm to determine which action to take. | |
int | exploit (self) |
Select the best action. | |
int | explore (self) |
Explore a new action. | |
numpy.ndarray | table (self) |
Return the Q-Table. | |
None | update (self, int action, float reward) |
Update the Q-Table. | |
Protected Attributes | |
_table | |
A base class used to create a variety of bandit solving agents.
This class provides a table that can be used to store reward estimates. It also defines the interface that any agent must define when implemented. This ensures consistent API across each agent type.
Definition at line 5 of file base_agent.py.
None agent.base_agent.BaseAgent.__init__ | ( | self, | |
int | k, | ||
float | start_value = 0.0 |
||
) |
Construct the agent.
k | The number of possible actions the agent can pick from at any given time. Must be an int greater than zero. |
start_value | An initial value to use for each possible action. This assumes that each action is equally likely at start, so all values in the Q-table are set to this value. |
ValueError | if k is not an integer greater than 0. |
Reimplemented in agent.epsilon_greedy.EpsilonGreedy, agent.greedy.Greedy, and agent.tests.test_base_agent.FakeAgent.
Definition at line 13 of file base_agent.py.
int agent.base_agent.BaseAgent.act | ( | self | ) |
Use a specific algorithm to determine which action to take.
This method should define how exactly the agent selects an action. It is free to use explore and exploit as needed.
Reimplemented in agent.epsilon_greedy.EpsilonGreedy, agent.greedy.Greedy, and agent.tests.test_base_agent.FakeAgent.
Definition at line 30 of file base_agent.py.
int agent.base_agent.BaseAgent.exploit | ( | self | ) |
Select the best action.
This will use the Q-table to select the action with the highest likelihood. Ties are broken arbitrarily.
Definition at line 39 of file base_agent.py.
int agent.base_agent.BaseAgent.explore | ( | self | ) |
Explore a new action.
This will select a random action to take from the Q-table, to explore the decision space more.
Definition at line 56 of file base_agent.py.
numpy.ndarray agent.base_agent.BaseAgent.table | ( | self | ) |
Return the Q-Table.
Definition at line 68 of file base_agent.py.
None agent.base_agent.BaseAgent.update | ( | self, | |
int | action, | ||
float | reward | ||
) |
Update the Q-Table.
This takes the result of the previous action and the resulting reward and should update the Q-Table. How it updates will depend on the specific implementation.
action | An int representing which arm action was taken. This should be between [0, k]. |
reward | A float representing the resulting reward obtained from the selected action. |
Reimplemented in agent.epsilon_greedy.EpsilonGreedy, agent.greedy.Greedy, and agent.tests.test_base_agent.FakeAgent.
Definition at line 76 of file base_agent.py.
|
protected |
Definition at line 27 of file base_agent.py.