Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Group Project Two: Simulation

Assignment:

For this project, you will run a simulation based on the multi-armed bandit

problem. Pretend you are doing a study abroad program somewhere new

and amazing. You will be there for 200 days. You get a meal at one of four  cafeterias (C1, C2, C3, and C4) each of the days that you are there. You     want to maximize the happiness you get eating at these cafeterias. There is a normal distribution describing the happiness that you will derive from

each cafeteria. This is given as the average happiness value (H1, H2, H3, H4) and the standard deviation (D1, D2, D3, D4).

You don’t know yet how much you might like each cafeteria. You want to strike a good balance between two concepts:

1. Exploration: going to the cafeterias enough to understand which one you will like the best.

2. Exploitation: once you know which cafeteria makes you the happiest, go there as much as possible.

Values to Use:

.   You will visit one of the cafeterias each day for 200 days.

.   There are three cafeterias: C1, C2, C3, and C4

.   The cafeterias have happiness values that are normally distributed with the following mean and standard deviation values:

o C1: average happiness = 7, standard deviation = 3

o C2: average happiness = 4, standard deviation = 10

o C3: average happiness = 10, standard deviation = 6

o C4: average happiness = 5, standard deviation = 2

Create Four Functions:

* Note: You should use these function names with the given punctuation.

1. exploitOnly(): For this function, you will first visit each cafeteria once     and generate happiness values for those cafeterias based on a normal  distribution with the mean and standard deviation for the cafeteria. After generating those values for the first four days, you will pick the one that returned the best happiness value and visit only that cafeteria for the  next 196 days. This function will return the sum of all the happiness values that were generated over 200 days.

2. exploreOnly(): For this function, you will visit each cafeteria the same    number of times. You will go to cafeteria 1 for 50 days, cafeteria 2 for 50 days, cafeteria 3 for 50 days, and cafeteria 4 for 50days. Happiness values for each visit are generated each day as a normally distributed random number based on the mean and standard deviation of happiness given for the visited cafeteria. This function will return the sum of all the happiness values that were generated over 200 days.

3. eGreedy(e=10): This function will need an optional variable, e, that

defaults to 10. * Note: the variable e is expected to be a number

between 0 and 100. For this function, e% of the time you will pick a

random cafeteria to go to and generate a happiness value based on the normal distribution of that cafeteria. The other 100-e% of the time you

will go to the current best cafeteria. You MAY NOT give code that goes   to random cafeterias the first e% days and then go to the best the rest of the time. Just like with exploitOnly, you need to visit each cafeteria the    first four days. Then, for the next 196 days, you will decide each day if     you will go to a random cafeteria or the current best cafeteria.  Be

careful to keep track of the average happiness values for each cafeteria as you visit them. You may want to keep a list of happiness values for    each cafeteria. This function will return the sum of all the happiness values that were generated over 200 days.

4. simulation(t, e=10): Run a simulation of the three strategies

(exploitOnly, exploreOnly, and eGreedy). The function should take as

input the number of trials (t) to run and the e value to use for eGreedy.    The e value should default to 10 if nothing is entered. Run each strategy t times and calculate the average happiness for each strategy. Print the  optimum happiness followed by a new line. Then for each strategy print  the strategy name followed by:

.   Expected happiness

.   Expected regret

.   Simulated happiness

.   Simulated regret

The following is an example of what this printout should look like.

Example Simulation Printout:

* Note: This is an example of what your printout should look like and doesn’t necessarily represent the values used in project 2.

Generating Happiness for Each Meal:

Every time you visit a cafeteria, you need to generate a happiness value for that visit. The happiness for each cafeteria is normally distributed and you    are given the average and standard deviation. When you visit a cafeteria,    you cannot assume that you will get the average happiness value. You

must calculate the happiness for that visit based on the normal distribution. There is a standard library in python called random that allows you to

generate different types of random numbers. You should import the library  at the top of your code. Sample code is given for calculating the happiness obtained from visiting cafeteria 1:

import random

# Get random number for a normal distribution with # mean of H1 and standard deviation of D1.

Happiness = random.normalvariate(H1, D1)

* Note that generating a number based on a normal distribution could possibly return a negative number for happiness. That is OK. Keep that value as a negative number and assume you had a truly horrible

experience. Maybe you got food poisoning or had an allergic reaction.

Determining Which Cafeteria to Use for eGreedy:

When it comes to deciding at which cafeteria to dine on a given day, you will use another random function. You will use the standard library called random again. You only need to import the library once at the top of your code. Use random.random to generate a random floating point number

between 0 and 1. Generate r, a random floating-point number between 0     and 1. If r is less than e/100, pick a random cafeteria and if r is greater than or equal to e/100, pick the best cafeteria that you’ve eaten at so far. Here   is some sample code:

# Generate a random float 0 < r < 1

r = random.random()

if r < e: # Pick a random cafeteria

# Generate a random integer of 1, 2, 3, or 4

i = random.randint(1,4)

else: # pick best so far...

The above code has another call to a random number generator. The

function random.randint() will return a random number in the given range.    Above I have (1, 4) so that the function will return a random number of 1, 2, 3, or 4 to determine which cafeteria to use.

Calculating Optimum Happiness, Expected Values, and Regret:

. Optimum happiness is defined as the highest average happiness value multiplied by the total number of days. For the optimum happiness, you   don’t need to generate random numbers. You just use the highest

average value.

. Expected happiness is estimated using the average happiness value multiplied by the total number of days that cafeteria was visited.

. Simulated happiness is calculated using the functions which generate happiness values for each visit to a cafeteria.

. Regret is calculated as the optimum happiness value minus the total happiness value.

Directions:

. Work Together! Make sure everyone on the team gets some time

working on the code. If one person is an experienced coder and another is an inexperienced coder, then work together to help the less

experienced learn how to code. You learn quite a bit when you try to

teach others! It will benefit you both. I would rather see some mistakes in your code because you all worked together rather than a perfect

masterpiece that only one person worked on. With that said, don’t do extra work that is not related to the stated problem. If you have

experience coding, don’t use that experience to create GUI’s or other things. I won’t see them, so you are wasting your time! You should be working WITH your other teammates to help them understand the

problem and how to create the code.

. Be Careful! Make sure that you turn in everything that you are

supposed to turn in. Make sure that your code runs correctly. I don’t care how cool and “smart” your code is. I want it to do what was asked. And I  want to be able to run it myself without considerable research or

debugging on my part.

. Turn in Your Work! Your tables that you generated through the

simulation should be turned in on Blackboard. Also, a short write up

about what you found. This should be written carefully. Write in complete sentences and spell check your work!

Project Schedule:

Thursday, October 19: (5% of project grade) Project 2 Introduction Attendance: Each student should be in class to meet their team members.

Sunday, October 22, 11:59pm: (5% of project grade) Initial Group Information: As a group, fill in the Initial Team Assessment file and submit on Blackboard. *Only one submission per group is needed.

Thursday, November 2, 11:59 pm: (15% of project grade) Each

student submits pseudocode for the task they have been assigned. This can basically be python code. But there should be comments about what should happen.

Sunday, November 12, 11:59 pm: (40% of project grade) Group

Project 2: Each group turns in the final project. This is your final python code. I should be able to download everything you turned in and test your project.

Monday, November 13, 11:59 pm: (15% of project grade) Individual Write up: Each student in the group will turn in a write up about how  their project works and some conclusions based on the results.

Tuesday, November 14, in class: (10% of project grade) Come to

class with your computer that has your code and demonstrate how your code works.

Wednesday, November 15, 11:59 pm: (10% of project grade)

Each student turns in “Team Member Assessment.” This should be

thoughtfully filled in detailing who worked on what tasks and what type of work each group member did.