next up previous contents
Next: ``Linear'' decision making Up: Stochastic Dynamic Programming Previous: Uncertain Payoffs

Uncertain States

A more interesting use of uncertainty occurs when the state that results from a decision is uncertain. For example, consider the following coin tossing game: a coin will be tossed 4 times. Before each toss, you can wager $0, $1, or $2 (provided you have sufficient funds). You begin with $1, and your objective is to maximize the probability you have $5 at the end. of the coin tosses.

We can formulate this as a dynamic program as follows: create a stage for the decision point before each flip of the coin, and a ``final'' stage, representing the result of the final coin flip. There is a state in each stage for each possible amount you can have. For stage 1, the only state is ``1'', for each of the others, you can set it to ``0,1,2,3,4,5'' (of course, some of these states are not possible, but there is no sense in worrying too much about that). Now, if we are in stage i and bet k and we have x dollars, then with probability .5, we will have x-k dollars, and with probability .5 we will have x+k dollars next period. Let tex2html_wrap_inline1210 be the probability of ending up with at least $5 given we have $x before the ith coin flip.

This gives us the following recursion:



Note that the next state is not known for certain, but is a probabilistic mixing of states. We can still easily determine tex2html_wrap_inline1220 from tex2html_wrap_inline1222 , and tex2html_wrap_inline1224 from tex2html_wrap_inline1220 and so on back to tex2html_wrap_inline1228 .

Another example comes from the pricing of stock options. Suppose we have the option to buy Netscape stock at $150. We can exercise this option anytime in the next 10 days (american option, rather than a european option that could only be exercised 10 days from now). The current price of Netscape is $140. We have a model of Netscape stock movement that predicts the following: on each day, the stock will go up by $2 with probability .4, stay the same with probability .1 and go down by $2 with probability .4. Note that the overall trend is downward (probably conterfactual, of course). The value of the option if we exercise it at price x is x-150 (we will only exercise at prices above 150).

We can formulate this as a stochastic dynamic program as follows: we will have stage i for each day i, just before the exercise or keep decision. The state for each stage will be the stock price of Netscape on that day. Let tex2html_wrap_inline1210 be the expected value of the option on day i given that the stock price is x. Then, the optimal decision is given by:




Given the size of this problem, it is clear that we should use a spreadsheet to do the calculations.

There is one major difference between stochastic dynamic programs and deterministic dynamic programs: in the latter, the complete decision path is known. In a stochastic dynamic program, the actual decision path will depend on the way the random aspects play out. Because of this, ``solving'' a stochastic dynamic program involves giving a decision rule for every possible state, not just along an optimal path.

next up previous contents
Next: ``Linear'' decision making Up: Stochastic Dynamic Programming Previous: Uncertain Payoffs

Michael A. Trick
Sun Jun 14 13:05:46 EDT 1998