k-Armed Bandit 1.0.0
A collection of k-armed bandits and assoicated agents for reinforcement learning
Loading...
Searching...
No Matches
Public Member Functions | Protected Attributes | List of all members
agent.base_agent.BaseAgent Class Reference

A base class used to create a variety of bandit solving agents. More...

Inheritance diagram for agent.base_agent.BaseAgent:
agent.epsilon_greedy.EpsilonGreedy agent.greedy.Greedy agent.tests.test_base_agent.FakeAgent

Public Member Functions

None __init__ (self, int k, float start_value=0.0)
 Construct the agent.
 
int act (self)
 Use a specific algorithm to determine which action to take.
 
int exploit (self)
 Select the best action.
 
int explore (self)
 Explore a new action.
 
numpy.ndarray table (self)
 Return the Q-Table.
 
None update (self, int action, float reward)
 Update the Q-Table.
 

Protected Attributes

 _table
 

Detailed Description

A base class used to create a variety of bandit solving agents.

This class provides a table that can be used to store reward estimates. It also defines the interface that any agent must define when implemented. This ensures consistent API across each agent type.

Definition at line 5 of file base_agent.py.

Constructor & Destructor Documentation

◆ __init__()

None agent.base_agent.BaseAgent.__init__ (   self,
int  k,
float   start_value = 0.0 
)

Construct the agent.

Parameters
kThe number of possible actions the agent can pick from at any given time. Must be an int greater than zero.
start_valueAn initial value to use for each possible action. This assumes that each action is equally likely at start, so all values in the Q-table are set to this value.
Exceptions
ValueErrorif k is not an integer greater than 0.

Reimplemented in agent.epsilon_greedy.EpsilonGreedy, agent.greedy.Greedy, and agent.tests.test_base_agent.FakeAgent.

Definition at line 13 of file base_agent.py.

Member Function Documentation

◆ act()

int agent.base_agent.BaseAgent.act (   self)

Use a specific algorithm to determine which action to take.

This method should define how exactly the agent selects an action. It is free to use explore and exploit as needed.

Returns
An int representing which arm action to take. This int should be between [0, k).

Reimplemented in agent.epsilon_greedy.EpsilonGreedy, agent.greedy.Greedy, and agent.tests.test_base_agent.FakeAgent.

Definition at line 30 of file base_agent.py.

◆ exploit()

int agent.base_agent.BaseAgent.exploit (   self)

Select the best action.

This will use the Q-table to select the action with the highest likelihood. Ties are broken arbitrarily.

Returns
An int representing which arm action to take. This int will be between [0, k).

Definition at line 39 of file base_agent.py.

◆ explore()

int agent.base_agent.BaseAgent.explore (   self)

Explore a new action.

This will select a random action to take from the Q-table, to explore the decision space more.

Returns
An int representing which arm action to take. This int will be between [0, k).

Definition at line 56 of file base_agent.py.

◆ table()

numpy.ndarray agent.base_agent.BaseAgent.table (   self)

Return the Q-Table.

Returns
a Numpy array of k elements. the i-th element holds the estimated value for the i-th action/arm.

Definition at line 68 of file base_agent.py.

◆ update()

None agent.base_agent.BaseAgent.update (   self,
int  action,
float  reward 
)

Update the Q-Table.

This takes the result of the previous action and the resulting reward and should update the Q-Table. How it updates will depend on the specific implementation.

Parameters
actionAn int representing which arm action was taken. This should be between [0, k].
rewardA float representing the resulting reward obtained from the selected action.

Reimplemented in agent.epsilon_greedy.EpsilonGreedy, agent.greedy.Greedy, and agent.tests.test_base_agent.FakeAgent.

Definition at line 76 of file base_agent.py.

Member Data Documentation

◆ _table

agent.base_agent.BaseAgent._table
protected

Definition at line 27 of file base_agent.py.


The documentation for this class was generated from the following file: