Monte Carlo Methods
Pros:
1. Very general
2. Can solve hard problems, e.g. calculating Pi and calculating integral
Cons:
1. Slow convergence
2. Incaccuracy w/ pseudo random number generator
Monte Carlo Method
- Numerical results
- Based on random sampling
- more accurate as the sample number grows
Example 1: computing PI
- Random sample points in [-1,1] x [-1,1]
- Count how many points are within the circle
- Pi = # of points in the circle / # of pints * baseArea ( 4 )
Example 2: computing Integral
- Random sample point in [0,1] x [0,1]
- Compute probability that x^2<=y
Monte Carlo Methods
- Solving the reinforcement learning problem
- Based on averaging sample returns
- Use: broadly for estimation method whose operation involves a significant random component
- 2 methods:
- First-visit MC method estimates vπ(s) as the average of the returns following first visits to s
- Every-visit MC method averages the returns following all visits to s
Converge
- First-visit MC
- Easy
- The standard deviation of its error falls as 1/ ( n^0.5), where n is the number of returns averaged.
- Every-visit MC
- Less straightforward
- Converge asymptotically to vπ(s) (Singh and Sutton, 1996)
Example 5.1: Blackjack
- Obtain cards the sum of whose numerical values is as great as possible without exceeding 21
- y = arg max f(t), y <= 21
- If the dealer goes bust, then the player wins; otherwise, the outcome—win, lose, or draw—is determined by whose final sum is closer to 21.
- In any event, after 500,000 games the value function is very well approximated.
Blackjack rules
- Rewards of +1, −1, and 0 are given for winning, losing, and drawing
- All rewards within a game are zero, and we do not discount (γ = 1)
- The player’s actions are to hit or to stick
- The player makes decisions on the basis of three variables:
- his current sum (12–21)
- the dealer’s one showing card (ace–10)
- whether or not he holds a usable ace
- This makes for a total of 200 states.
- Note that in this task the same state never recurs within one episode, so there is no difference between first-visit and every-visit MC methods.
Why not DP
- DP methods require the distribution of next events—in particular, they require the quantities p(s 0 , r|s, a)—and it is not easy to determine these for blackjack
- Expected rewards and transition probabilities ( often complex and error-prone ) must be computed before DP can be applied
- In contrast, generating the sample games required by Monte Carlo methods is easy
- The ability of Monte Carlo methods to work with sample episodes alone can be a significant advantage even when one has complete knowledge of the environment’s dynamics.
Fundamental differences
- Sampling
- DP diagram shows all possible transitions
- Monte Carlo diagram shows only those sampled on the one episode
- Tracing
- DP diagram includes only one-step transitions
- Monte Carlo diagram goes all the way to the end of the episode.
Computational Expense
- Computational expense of estimating the value of a single state is independent of the number of states
- Monte Carlo methods particularly attractive when one requires the value of only one or a subset of states
- One can generate many sample episodes starting from the states of interest, averaging returns from only these states ignoring all others.
Monte Carlo Methods
By txshon Tseng
Monte Carlo Methods
- 220