k-Armed Bandit 1.0.0
A collection of k-armed bandits and assoicated agents for reinforcement learning
Loading...
Searching...
No Matches
greedy.py
Go to the documentation of this file.
1from agent import BaseAgent
2
3
5 """
6 An agent that always exploits, never explores.
7
8 It will always pick the action with the highest value from the Q-table. While these values will be updated, it
9 never explores, so will likely quickly converge on a single action.
10 """
11
12 def __init__(self, k: int, start_value: float = 0.0) -> None:
13 """
14 Construct the agent.
15
16 @param k The number of arms to select from. Should be an int greater than zero.
17 @param start_value The starting reward to use for each arm. All arms assume the same value at the start.
18 """
19 super().__init__(k, start_value=start_value)
20 # Track how many selections have been made to use in the update formula.
21 self._n = 0
22
23 def act(self) -> int:
24 """
25 Select an action to take from the available ones.
26
27 Greedy always exploits, so this will always be one of the actions with the highest table value.
28 @return An int representing the selected action. It will be on the interval [0, k).
29 """
30 return self.exploit()
31
32 def update(self, action: int, reward: float) -> None:
33 """
34 Update the table values based on the last action.
35
36 This uses an iterative version of a running average to update table values.
37 @param action The index corresponding to the action that was taken.
38 @param reward The resulting reward that was earned.
39 """
40 self._n += 1
41 self.table[action] += (reward - self.table[action]) / self._n
A base class used to create a variety of bandit solving agents.
Definition base_agent.py:5
int exploit(self)
Select the best action.
Definition base_agent.py:39
numpy.ndarray table(self)
Return the Q-Table.
Definition base_agent.py:68
An agent that always exploits, never explores.
Definition greedy.py:4
int act(self)
Select an action to take from the available ones.
Definition greedy.py:23
None __init__(self, int k, float start_value=0.0)
Construct the agent.
Definition greedy.py:12
None update(self, int action, float reward)
Update the table values based on the last action.
Definition greedy.py:32