# Optimal Control via Dynamic Programming

## The Optimal Control Problem

To recap from my last post (was that in September? yikes!),
the optimal control problem boils down to finding an admissible control
\( \def\uvec{\boldsymbol u} \uvec^* \in U \)
which will cause the system, itself described by
\[
\def\avec{\boldsymbol a} \def\xvec{\boldsymbol x}
\dot\xvec(t) = \avec(\xvec(t), \uvec(t), t),
\]
to follow an admissible trajectory \( \xvec^* \in X \)
which will minimize the performance measure
\[
J = h(\xvec(t_f), t_f)
+ \int_{t_0}^{t_f} g(\xvec(t),\uvec(t),t)\,dt.
\]
We seek to find this in the form of an optimal *control law*
or *policy* \( \boldsymbol f \) such that
\[
\uvec^*(t) = \boldsymbol f(\xvec(t), t).
\]
This post will discuss the use of
*dynamic
programming*
in optimization

## Dynamic Programming

Consider a discrete-time version of the system above:
\[
\xvec([k+1] Δt) ≈
\xvec(kΔt)
+ Δt ⋅ \avec\left(\xvec(kΔt), \uvec(kΔt), kΔt\right).
\]
More precisely, and using the notation \( \text{—}[k] ≡ \text{—}(kΔt) \),
the discrete-time system equation is
\[
\def\dscrt{_{\mathrm D}}
\xvec[k+1] = \avec\dscrt\left(\xvec[k], \uvec[k], k\right).
\]
The performance measure can be discretized thus:
\[
J = h(x(NΔt)) + \int_0^{Δt} g\,dt + \int_{Δt}^{2Δt} g\,dt
+ ⋯ + \int_{(N-1)Δt}^{NΔt} g\,dt,
\]
which can be approximated by
\[
J ≈ h(x[N]) + Δt \sum_{k=0}^{N-1}g(\xvec[k], \uvec[k]),
\]
or it can be exactly expressed by
\[
J = h(x[N]) + \sum_{k=0}^{N-1}g\dscrt(\xvec[k], \uvec[k]).
\]
(Note that \( \avec\dscrt \) and \( g\dscrt \) are *not*
merely the linearized approximations to \( \avec \) and \( g \),
but have been defined to be exact over the time period in question.
Note also that for simplicity I have assumed a time-invariant system.)

Now to discuss dynamic programming we need some tedious notation:

\( \alpha \) | the current state |
---|---|

\( u_i \) | an allowable decision (control) |

\( x_i \) | a state adjacent to \( \alpha \), reachable by applying control \( u_i \) |

\( h \) | the final state |

\( J_{\alpha x_i} \) | the cost of moving from current state \( \alpha \) to adjacent state \( x_i \) |

\( J_{x_i h}^* \) | the minimum cost to reach final state \( h \) from arbitrary state \( x_i \) |

\( C_{\alpha x_i h}^* \) | the minimum cost to reach final state \( h \) from current state \( \alpha \) via adjacent state \( x_i \) |

\( J_{\alpha h}^* \) | the minimum cost to reach final state \( h \) from current state \( \alpha \) via any allowable path |

\( u^*(\alpha) \) | the optimal decision (control) at current state \( \alpha \) |

See Wikipedia on optimal substructure and the Bellman equation for details on this, but for our purposes the Principle of Optimality implies \[ C_{\alpha x_i h}^* = J_{\alpha x_i} + J_{x_i h}^*, \] and the optimal decision \( u^*(\alpha) \in {u_i} \) is the one that leads to \[ J_{\alpha h}^* = \mathrm{min}\{C_{\alpha x_i h}^*\}. \]

Next time: determining a recurrence relation for the optimal cost function.