2from agent 
import BaseAgent
 
    7    A greedy agent that occasionally explores. 
    9    This agent will primarily exploit when deciding its actions. However, it will occasionally choose to explore at a 
   10    rate of epsilon, which is provided at initialization. This gives it a chance to see if other actions are better 
   14    def __init__(self, k: int, epsilon: float, start_value: float = 0.0) -> 
None:
 
   18        @param k The number of actions to consider. This must be an int greater than zero. 
   19        @param epsilon The rate at which actions should randomly explore. As this is a probability, it should be between 
   21        @param start_value The initial value to use in the table. All actions start with the same value. 
   22        @exception ValueError if epsilon is not a valid probability (between 0 and 1). 
   24        super().
__init__(k, start_value=start_value)
 
   29        self.
_rng = numpy.random.default_rng()
 
 
   33        Determine which action to take. 
   35        This will explore randomly over the actions at a rate of epsilon and inversely will exploit based on table 
   36        values at a rate of (1.0 - epsilon). 
   37        @return The index of the selected action to take. Gauranteed to be an int on the range [0, k). 
   41        should_explore = (samples[0] == 1)
 
 
   54        if value < 0.0 
or value > 1.0:
 
   56                'Epsilon must be a valid probability, so between 0 and 1 (inclusive)!')
 
 
   59    def update(self, action: int, reward: float) -> 
None:
 
   61        Update the Q-table based on the last action. 
   63        This will use an incremental formulation of the mean of all rewards obtained so far as the values of the table. 
   64        @param action An index representing which action on the table was selected. It must be between [0, k). 
   65        @param reward The reward obtained from this action. 
   68        self.
table[action] += (reward - self.
table[action]) / self.
_n 
 
 
A base class used to create a variety of bandit solving agents.
 
int explore(self)
Explore a new action.
 
int exploit(self)
Select the best action.
 
numpy.ndarray table(self)
Return the Q-Table.
 
A greedy agent that occasionally explores.
 
int act(self)
Determine which action to take.
 
None __init__(self, int k, float epsilon, float start_value=0.0)
Construct the agent.
 
None update(self, int action, float reward)
Update the Q-table based on the last action.
 
None epsilon(self, float value)