4import matplotlib.pyplot
6Compete various agents against each other and display the results.
8This script analyzes the performance of different agents. In general, it will simulate them on a given bandit M times,
9then repeat this action on N different bandits. The rewards obtained are tracked over the whole simulation. Afterwards,
10some statistics are calculated and plotted for consideration.
12The main statistic under consideration is the total reward earned by each agent. A better agent should have better
13performance in the long run. This is tracked at each time step and plotted to show how each agent performs over time.
27 bandits.append(single_bandit)
39rewards = numpy.zeros(shape=(len(agents), N, M), dtype=numpy.float)
40for i, test_agent
in enumerate(agents):
45 test_agent._table = numpy.zeros_like(a=test_agent.table)
46 cumulative_mean_reward = 0.0
49 action = test_agent.act()
50 reward = bandits[n].select(index=action)
51 test_agent.update(action=action, reward=reward)
52 cumulative_mean_reward += (reward -
53 cumulative_mean_reward) / (m + 1)
54 rewards[i, n, m] = cumulative_mean_reward
58mean_rewards = numpy.mean(a=rewards, axis=1)
59for i, agent_name
in enumerate(agents):
60 matplotlib.pyplot.plot(mean_rewards[i])
61matplotlib.pyplot.legend(agent_names)
62matplotlib.pyplot.show()
A greedy agent that occasionally explores.
An agent that always exploits, never explores.
This bandit draws a reward from a set normal distribution each time an arm is chosen.