Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSE257 Winter 2023: Assignment 4

1                    Deadline: Mar-19 23:59 PT.

2                     • Submit answer pdf file on Gradescope. When a question asks you to“Show/Write/List/Plot/Explain” some-

3                            thing, you need to show them in the answer pdf. Otherwise points will be taken off.

4                     • Submit all code as a zip file to https://www .dropbox .com/request/IPa2QEVyclIld5ostZG3.

5        All other requirements are the same as previous assignments.

6        Question 1 (Implementation Required). Let X be a random variable taking values of either 0 or 1, with probability

7        P(X = 0) = 0.3 and P(X = 1) = 0.7.

8                  1.  (1 Point) Draw N samples of X as X1 , ...,XN  and plot the histogram of the average  =    Xi  with

9                               500 trials (each trial for N  needs N samples of X). Do this for three different values of N = 1, 10, 500.

10                           For the histogram you can divide [0, 1] into 10 bins (i.e., [0, 0.1), [0.1, 0.2), etc., if you want to use more bins to

11                           be more accurate, feel free to do so). The x-axis should be the bins for the values of N , and the y-axis should

12                           be the frequency of occurrence of N  in each bin. So you should show three plots, one for each different N,

13                            and in each one you need do 500 trials (each trial uses N samples), and show how the N  is distributed as a

14                           histogram.

15                  2. (2 Points) Let µ be the true expectation of the random variable X specified above. Let ε = 0.1. Plot P(|N  −

16                            µ|  ≥ ε) as a function as N grows and compare it with the Hoeffding bound, as follows. Let the x-axis be N

17                            going from 1 to 100. For each N, do 10 trials of using N samples to compute the average N , and show the

18                            frequency of |N  − µ|  ≥ ε happening on the y-axis (for instance, if you see 3 times of that happening out of 19                            10 trials when N = 5 then the y-axis should show 0.3 when N = 5).

20                            On the same plot, show how the Hoeffding bound changes over time as well. That is, plot the righthand side of

21                           the Hoeffding inequality (check slides), as a function of N.

22        Question 2 (Implementation Required). Consider two coins, and write the random variable for the payoff from each

23        coin as X (1)  and X (2) . The ground truth distribution for each coin is P(X(1)   = 0) = 0.2, P(X(1)   =  1) = 0.8,

24        P(X(2)  = 0) = 0.6 with P(X(2)  = 1) = 0.4. Plot the regret R(T) as the number of plays T goes from 10 to 500 for

25        each of the following strategies. In your plot, the x-axis is T and y-axis is R(T), and you should plot everything on

26        the same graph so that their regret curves can be compared.

27                  1. (1 Point) Explore-then-commit with N = ⌈0.3T⌉ (⌈·⌉ is the ceiling function that gives you an integer).

28                  2. (1 Point) Explore-then-commit with N = ⌈ T  (log T) ⌉, where log is the natural logarithm.

29                  3. (1 Point) The ε-Greedy strategy with ε = 0.3. (With 1 − ε probability play the currently-best coin, and with ε

30                           probability the other).

31                  4. (1 Point) The Upper Confidence Bound strategy, which plays the coin i ∈ {1, 2} that maximizes (i)+ ^  

32                            in each round T.

33        Question 3 (Implementation Required Only for the Last Sub-Question). Consider the tic-tac-toe game from the pre-

34        vious assignment, and you are still playing for Player O. Answer the following questions conceptually first:

35                  1. (1 Point) Consider an arbitrary initial state where Player X has put down a piece and now you need to do MCTS

36                            for Player O. After the first 8 iterations of MCTS, what does the search tree look like? Draw it by hand: just the

37                            nodes and edges that current exist in the tree. No need to annotate it with any values.

38                  2. (2 Points) Now consider the 9-th iteration of the MCTS algorithm that continues to build the search tree that you

39                            wrote down in the previous question. Explain the following:

40                                (a) Explain how you will choose the next node to expand in the tree.

41                                (b) Write down one game trajectory that you may see as a sample in the rollout/simulation phase of MCTS at

42                                             this 9th-iteration, and explain how you choose the actions for each player in this trajectory.

43                                (c) What nodes are you going to annotate from the outcome of this sample trajectory that you just wrote down?

44                            Again you do not need to implement anything here, just think about what the algorithm will do.

45                  3. (2 Points) Now, consider the game state in Figure 1 again. After 8 iterations of MCTS from this state (i.e., this

46                            state is now the root of the MCTS tree), at most how many non-root nodes are there in the MCTS tree? Explain

47                            your reasoning, which includes drawing the search tree that has this maximum number of nodes.

 

 

Figure 1: Example of a feasible game state. Your play circle, and it is your turn now.

48

49                           Now if you have time, implement the MCTS algorithm for tic-tac-toe and answer the following:

50                  4. (Extra 2 Points) Show how the search tree is expanded in the first 10 iterations of MCTS, at the game state in

51                           Figure 1. That is, show the search tree obtained in MCTS (you can either hand-draw or print it) at the end of

52                            each iteration, for the first 10 iterations.

Question 4 (Implementation Required). Consider the following constraint system. The variables are X = {x1 ,x2 } with initial domains x1  ∈ D1  = [0, 2] and x2  ∈ D2  = [0, 2]. The constraints are

C1  :  x1  = x2(3)

C2  :  x1  = log(x2 + 1)

C3  :  x1  exp(x2 ) 1

53        log means natural logrithm. Answer the following questions.

54                  1. (1 Point) Consider C2 and x2  ∈ D2 . What is the largest subset of D1 for x1 that is arc-consistent with x2 ?

55                  2. (2 Points) Implement the backtracking search algorithm based on arc-consistency, to find one satisfying assign-

56                            ment for the constraint system {C1 ,C2 ,C3 } within D1  × D2 , or report no solution if there isn’t any. In the

57                            answer pdf, show the solutions you found, if any, and explain where the branching operations happen during

58                            solving, if any.

59                  3. (1 Point) Continuing on the previous question, can you tweak the algorithm to find all satisfying assignments

60                           to the system? How do you know that you have exhausted all the possible satisfying assignments? Show all the

61                            satisfying assignments you can find for this system. 

62                  4. (1 Point) Change the third constraint into

C : x1  ≥ exp(x2 ) − 0.9

63                            Use your constraint solving algorithm to find all solutions of the new system under {C1 ,C2 ,C} with the same

64                            initial domains D1  × D2 . Report no solution if there isn’t any.

65        Question 5 (2 Points, No Implementation). Put the following formula into Conjunctive Normal Form:

(                                                )

66        then write down in the Dimacs format encoding for SAT solving for this formula, as specified in the slides.