Group Project Two: Simulation
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Group Project Two: Simulation
Assignment:
For this project, you will run a simulation based on the multi-armed bandit
problem. Pretend you are doing a study abroad program somewhere new
and amazing. You will be there for 200 days. You get a meal at one of four cafeterias (C1, C2, C3, and C4) each of the days that you are there. You want to maximize the happiness you get eating at these cafeterias. There is a normal distribution describing the happiness that you will derive from
each cafeteria. This is given as the average happiness value (H1, H2, H3, H4) and the standard deviation (D1, D2, D3, D4).
You don’t know yet how much you might like each cafeteria. You want to strike a good balance between two concepts:
1. Exploration: going to the cafeterias enough to understand which one you will like the best.
2. Exploitation: once you know which cafeteria makes you the happiest, go there as much as possible.
Values to Use:
. You will visit one of the cafeterias each day for 200 days.
. There are three cafeterias: C1, C2, C3, and C4
. The cafeterias have happiness values that are normally distributed with the following mean and standard deviation values:
o C1: average happiness = 7, standard deviation = 3
o C2: average happiness = 4, standard deviation = 10
o C3: average happiness = 10, standard deviation = 6
o C4: average happiness = 5, standard deviation = 2
Create Four Functions:
* Note: You should use these function names with the given punctuation.
1. exploitOnly(): For this function, you will first visit each cafeteria once and generate happiness values for those cafeterias based on a normal distribution with the mean and standard deviation for the cafeteria. After generating those values for the first four days, you will pick the one that returned the best happiness value and visit only that cafeteria for the next 196 days. This function will return the sum of all the happiness values that were generated over 200 days.
2. exploreOnly(): For this function, you will visit each cafeteria the same number of times. You will go to cafeteria 1 for 50 days, cafeteria 2 for 50 days, cafeteria 3 for 50 days, and cafeteria 4 for 50days. Happiness values for each visit are generated each day as a normally distributed random number based on the mean and standard deviation of happiness given for the visited cafeteria. This function will return the sum of all the happiness values that were generated over 200 days.
3. eGreedy(e=10): This function will need an optional variable, e, that
defaults to 10. * Note: the variable e is expected to be a number
between 0 and 100. For this function, e% of the time you will pick a
random cafeteria to go to and generate a happiness value based on the normal distribution of that cafeteria. The other 100-e% of the time you
will go to the current best cafeteria. You MAY NOT give code that goes to random cafeterias the first e% days and then go to the best the rest of the time. Just like with exploitOnly, you need to visit each cafeteria the first four days. Then, for the next 196 days, you will decide each day if you will go to a random cafeteria or the current best cafeteria. Be
careful to keep track of the average happiness values for each cafeteria as you visit them. You may want to keep a list of happiness values for each cafeteria. This function will return the sum of all the happiness values that were generated over 200 days.
4. simulation(t, e=10): Run a simulation of the three strategies
(exploitOnly, exploreOnly, and eGreedy). The function should take as
input the number of trials (t) to run and the e value to use for eGreedy. The e value should default to 10 if nothing is entered. Run each strategy t times and calculate the average happiness for each strategy. Print the optimum happiness followed by a new line. Then for each strategy print the strategy name followed by:
. Expected happiness
. Expected regret
. Simulated happiness
. Simulated regret
The following is an example of what this printout should look like.
Example Simulation Printout:
* Note: This is an example of what your printout should look like and doesn’t necessarily represent the values used in project 2.
Generating Happiness for Each Meal:
Every time you visit a cafeteria, you need to generate a happiness value for that visit. The happiness for each cafeteria is normally distributed and you are given the average and standard deviation. When you visit a cafeteria, you cannot assume that you will get the average happiness value. You
must calculate the happiness for that visit based on the normal distribution. There is a standard library in python called random that allows you to
generate different types of random numbers. You should import the library at the top of your code. Sample code is given for calculating the happiness obtained from visiting cafeteria 1:
import random
# Get random number for a normal distribution with # mean of H1 and standard deviation of D1.
Happiness = random.normalvariate(H1, D1)
* Note that generating a number based on a normal distribution could possibly return a negative number for happiness. That is OK. Keep that value as a negative number and assume you had a truly horrible
experience. Maybe you got food poisoning or had an allergic reaction.
Determining Which Cafeteria to Use for eGreedy:
When it comes to deciding at which cafeteria to dine on a given day, you will use another random function. You will use the standard library called random again. You only need to import the library once at the top of your code. Use random.random to generate a random floating point number
between 0 and 1. Generate r, a random floating-point number between 0 and 1. If r is less than e/100, pick a random cafeteria and if r is greater than or equal to e/100, pick the best cafeteria that you’ve eaten at so far. Here is some sample code:
# Generate a random float 0 < r < 1
r = random.random()
if r < e: # Pick a random cafeteria
# Generate a random integer of 1, 2, 3, or 4
i = random.randint(1,4)
else: # pick best so far...
The above code has another call to a random number generator. The
function random.randint() will return a random number in the given range. Above I have (1, 4) so that the function will return a random number of 1, 2, 3, or 4 to determine which cafeteria to use.
Calculating Optimum Happiness, Expected Values, and Regret:
. Optimum happiness is defined as the highest average happiness value multiplied by the total number of days. For the optimum happiness, you don’t need to generate random numbers. You just use the highest
average value.
. Expected happiness is estimated using the average happiness value multiplied by the total number of days that cafeteria was visited.
. Simulated happiness is calculated using the functions which generate happiness values for each visit to a cafeteria.
. Regret is calculated as the optimum happiness value minus the total happiness value.
Directions:
. Work Together! Make sure everyone on the team gets some time
working on the code. If one person is an experienced coder and another is an inexperienced coder, then work together to help the less
experienced learn how to code. You learn quite a bit when you try to
teach others! It will benefit you both. I would rather see some mistakes in your code because you all worked together rather than a perfect
masterpiece that only one person worked on. With that said, don’t do extra work that is not related to the stated problem. If you have
experience coding, don’t use that experience to create GUI’s or other things. I won’t see them, so you are wasting your time! You should be working WITH your other teammates to help them understand the
problem and how to create the code.
. Be Careful! Make sure that you turn in everything that you are
supposed to turn in. Make sure that your code runs correctly. I don’t care how cool and “smart” your code is. I want it to do what was asked. And I want to be able to run it myself without considerable research or
debugging on my part.
. Turn in Your Work! Your tables that you generated through the
simulation should be turned in on Blackboard. Also, a short write up
about what you found. This should be written carefully. Write in complete sentences and spell check your work!
Project Schedule:
• Thursday, October 19: (5% of project grade) Project 2 Introduction Attendance: Each student should be in class to meet their team members.
• Sunday, October 22, 11:59pm: (5% of project grade) Initial Group Information: As a group, fill in the Initial Team Assessment file and submit on Blackboard. *Only one submission per group is needed.
• Thursday, November 2, 11:59 pm: (15% of project grade) Each
student submits pseudocode for the task they have been assigned. This can basically be python code. But there should be comments about what should happen.
• Sunday, November 12, 11:59 pm: (40% of project grade) Group
Project 2: Each group turns in the final project. This is your final python code. I should be able to download everything you turned in and test your project.
• Monday, November 13, 11:59 pm: (15% of project grade) Individual Write up: Each student in the group will turn in a write up about how their project works and some conclusions based on the results.
• Tuesday, November 14, in class: (10% of project grade) Come to
class with your computer that has your code and demonstrate how your code works.
• Wednesday, November 15, 11:59 pm: (10% of project grade)
Each student turns in “Team Member Assessment.” This should be
thoughtfully filled in detailing who worked on what tasks and what type of work each group member did.
2023-10-28