k-Armed Bandit 1.0.0
A collection of k-armed bandits and assoicated agents for reinforcement learning
Loading...
Searching...
No Matches
analysis.py
Go to the documentation of this file.
1import agent
2import bandit
3import numpy
4import matplotlib.pyplot
5"""
6Compete various agents against each other and display the results.
7
8This script analyzes the performance of different agents. In general, it will simulate them on a given bandit M times,
9then repeat this action on N different bandits. The rewards obtained are tracked over the whole simulation. Afterwards,
10some statistics are calculated and plotted for consideration.
11
12The main statistic under consideration is the total reward earned by each agent. A better agent should have better
13performance in the long run. This is tracked at each time step and plotted to show how each agent performs over time.
14"""
15# Set the simulation parameters.
16# How many arms each bandit has
17K = 10
18# How many bandits to test on.
19N = 2000
20# How many times to select an arm on the bandit.
21M = 1000
22
23# Create the bandit and agents. Use several different epsilon values.
24bandits = []
25for i in range(N):
26 single_bandit = bandit.Normal(k=K)
27 bandits.append(single_bandit)
28agents = [
29 agent.Greedy(k=K),
30 agent.EpsilonGreedy(k=K, epsilon=0.01),
31 agent.EpsilonGreedy(k=K, epsilon=0.1),
32]
33agent_names = [
34 '0.0',
35 '0.01',
36 '0.1',
37]
38
39rewards = numpy.zeros(shape=(len(agents), N, M), dtype=numpy.float)
40for i, test_agent in enumerate(agents):
41 # Iterate through each sample trial
42 for n in range(N):
43 # Reset the Q-table. Typically, this is a private property and shouldn't be modified this way, but a reset
44 # feature is not available.
45 test_agent._table = numpy.zeros_like(a=test_agent.table)
46 cumulative_mean_reward = 0.0
47 # Select actions the appropriate number of times
48 for m in range(M):
49 action = test_agent.act()
50 reward = bandits[n].select(index=action)
51 test_agent.update(action=action, reward=reward)
52 cumulative_mean_reward += (reward -
53 cumulative_mean_reward) / (m + 1)
54 rewards[i, n, m] = cumulative_mean_reward
55
56# Once all trials are complete, average across the N bandits to get the average performance for each agent at each
57# iteration
58mean_rewards = numpy.mean(a=rewards, axis=1)
59for i, agent_name in enumerate(agents):
60 matplotlib.pyplot.plot(mean_rewards[i])
61matplotlib.pyplot.legend(agent_names)
62matplotlib.pyplot.show()
A greedy agent that occasionally explores.
An agent that always exploits, never explores.
Definition greedy.py:4
This bandit draws a reward from a set normal distribution each time an arm is chosen.
Definition normal.py:5