1from agent
import BaseAgent
6 An agent that always exploits, never explores.
8 It will always pick the action with the highest value from the Q-table. While these values will be updated, it
9 never explores, so will likely quickly converge on a single action.
12 def __init__(self, k: int, start_value: float = 0.0) ->
None:
16 @param k The number of arms to select from. Should be an int greater than zero.
17 @param start_value The starting reward to use for each arm. All arms assume the same value at the start.
19 super().
__init__(k, start_value=start_value)
25 Select an action to take from the available ones.
27 Greedy always exploits, so this will always be one of the actions with the highest table value.
28 @return An int representing the selected action. It will be on the interval [0, k).
32 def update(self, action: int, reward: float) ->
None:
34 Update the table values based on the last action.
36 This uses an iterative version of a running average to update table values.
37 @param action The index corresponding to the action that was taken.
38 @param reward The resulting reward that was earned.
41 self.
table[action] += (reward - self.
table[action]) / self.
_n
A base class used to create a variety of bandit solving agents.
int exploit(self)
Select the best action.
numpy.ndarray table(self)
Return the Q-Table.
An agent that always exploits, never explores.
int act(self)
Select an action to take from the available ones.
None __init__(self, int k, float start_value=0.0)
Construct the agent.
None update(self, int action, float reward)
Update the table values based on the last action.