Dynamic Programming, DP.

## Course Description

**Dynamic programming** is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.

In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively. Likewise, in computer science, if a problem can be solved optimally by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems, then it is said to have optimal substructure.

If sub-problems can be nested recursively inside larger problems, so that dynamic programming methods are applicable, then there is a relation between the value of the larger problem and the values of the sub-problems.[1] In the optimization literature this relationship is called the Bellman equation.Mathematical optimization

In terms of mathematical optimization, dynamic programming usually refers to simplifying a decision by breaking it down into a sequence of decision steps over time. This is done by defining a sequence of **value functions***V*1, *V*2, …, *Vn* taking *y* as an argument representing the **state** of the system at times *i* from 1 to *n*. The definition of *Vn*(*y*) is the value obtained in state *y* at the last time *n*. The values *Vi* at earlier times *i* = *n* −1, *n* − 2, …, 2, 1 can be found by working backwards, using a recursive relationship called the Bellman equation. For *i* = 2, …, *n*, *Vi*−1 at any state *y* is calculated from *Vi* by maximizing a simple function (usually the sum) of the gain from a decision at time *i* − 1 and the function *Vi* at the new state of the system if this decision is made. Since *Vi* has already been calculated for the needed states, the above operation yields *Vi*−1 for those states. Finally, *V*1 at the initial state of the system is the value of the optimal solution. The optimal values of the decision variables can be recovered, one by one, by tracking back the calculations already performed.