1from bandit
import BaseBandit
7 This bandit draws a reward from a set normal distribution each time
8 an arm is chosen. Each arm has its own distribution that is fixed upon
9 construction. Each distribution has a standard deviation of 1 and a mean
10 randomly drawn from the uniform range [-1, 1).
17 This includes defining the normal distribution parameters for each
18 arm. There is a different distribution for each arm. The means are
19 sampled from the uniform range [-1, 1). The standard deviations are
21 @param k The number of arms this bandit should have. This must be an
26 self.
_std = numpy.ones(shape=(k,), dtype=numpy.float)
28 self.
_mean = numpy.random.uniform(low=-1.0, high=1.0, size=(k,))
32 Select one or several arms to obtain a reward from.
34 @param index Any numpy valid indexing method to select which arms
35 a reward should be drawn from. None can also be passed, but will only
36 return a reward of None.
37 @return The rewards. The size of this will depend on the type of index.
38 If a single integer is passed in, a single float will be returned.
39 Otherwise, a numpy array will be returned. If None is passed in, this
44 means = self.
_mean[index]
45 stds = self.
_std[index]
46 return numpy.random.normal(loc=means, scale=stds)
50 Return the distribution parameters for the arms.
52 @return A tuple containing the parameters for each arm's distribution.
53 The first element of the tuple will be a numpy array holding the means
54 for each arm. The second element will also be a numpy array with the
A base class for the various bandit implementations.
This bandit draws a reward from a set normal distribution each time an arm is chosen.
None __init__(self, int k)
Construct the class.
select(self, index)
Select one or several arms to obtain a reward from.
trueValues(self)
Return the distribution parameters for the arms.