# A Recurrence Relation of Dynamic Programming

(For simplicity, this discussion will assume time-invariance. If ever I need to do the time-variant version of this problem, I will post about whatever differences I find.)

As discussed last time, we have the discrete-time system (actually the discrete-time equivalent of the original system) $\def\avec{\boldsymbol a} \def\uvec{\boldsymbol u} \def\xvec{\boldsymbol x} \def\dscrt{_{\mathrm D}} \xvec[k+1] = \avec\dscrt(\xvec[k], \uvec[k]),$ for which we wish to minimize the performance measure $J = h(x[N]) + \sum_{k=0}^{N-1} g\dscrt(\xvec[k], \uvec[k]).$

Now, adapting the “tedious notation” from the other post to our purposes, we have $$J_{NN}\left(\xvec(N)\right) := h(x[N])$$ defined as the cost associated with whatever final state we end up with; and \begin{aligned} J_{N-1,N}(\xvec[N-1], \uvec[N-1]) := & g\dscrt(\xvec[N-1], \uvec[N-1]) + h(x[N]) \\ = & g\dscrt(\xvec[N-1], \uvec[N-1]) + J_{NN}(\xvec[N]) \end{aligned} the cost of getting to the final state $$\xvec[N]$$ from the penultimate state $$\xvec[N-1]$$ via control $$\uvec[N-1]$$. And, since $$\xvec[N]$$ is completely determined by $$\xvec[N-1]$$ and $$\uvec[N-1]$$, we can rewrite this as $\begin{multline} J_{N-1,N}(\xvec[N-1], \uvec[N-1]) := \\ g\dscrt(\xvec[N-1], \uvec[N-1]) + J_{NN}(\avec\dscrt(\xvec[N-1], \uvec[N-1])) \end{multline}$ This yields the optimal (i.e., minimal) cost $\begin{multline} J_{N-1,N}^* (\xvec[N-1]) := \mathrm{min}_{ \{\uvec[N-1]\} } \big\{ g\dscrt(\xvec[N-1], \uvec[N-1]) \\ {} + J_{NN}(\avec\dscrt(\xvec[N-1], \uvec[N-1])) \big\} \end{multline}$ Denote this minimizing control function by $$\uvec^*(\xvec[N-1], N-1)$$.

Now consider the last two intervals together: $\begin{multline} J_{N-2,N}(\xvec[N-2], \uvec[N-2], \uvec[N-1]) \\ \shoveleft \quad = g\dscrt(\xvec[N-2], \uvec[N-2]) + g\dscrt(\xvec[N-1], \uvec[N-1]) + h(\xvec[N]) \\ \shoveleft \quad = g\dscrt(\xvec[N-2], \uvec[N-2]) + J_{N-1,N}(\xvec[N-1], \uvec[N-1]) \end{multline}$ The optimal policy for this yields the cost $\begin{multline} J_{N-2,N}^* (\xvec[N-2]) := \mathrm{min}_{ \{\uvec[N-2], \uvec[N-1]\} } \big\{ g\dscrt(\xvec[N-2], \uvec[N-2]) \\ {} + J_{N-1,N}(\xvec[N-1], \uvec[N-1]) \big\} \end{multline}$

By the principle of optimality, the remaining decision $$\uvec[N-1]$$ must be optimized with respect to $$\xvec[N-1]$$ regardless of the values of $$\xvec[N-2]$$ and $$\uvec[N-2]$$. We can therefore express $$J_{N-2,N}^*(\xvec[N-2])$$ in terms of $$J_{N-1,N}^*(\xvec[N-1])$$: $\begin{multline} J_{N-2,N}^*(\xvec[N-2]) = \mathrm{min}_{ \{\uvec[N-2]\} } \big\{ g\dscrt(\xvec[N-2], \uvec[N-2]) \\ {} + J_{N-1,N}^*(\xvec[N-1]) \big\} \end{multline}$ And, as above, we can use $$\avec()$$ to express everything in terms of a single time point: $\begin{multline} J_{N-2,N}^*\left(\xvec[N-2]\right) = \mathrm{min}_{ \{\uvec[N-2]\} } \big\{ g\dscrt(\xvec[N-2], \uvec[N-2]) \\ {} + J_{N-1,N}^*(\avec\dscrt(\xvec[N-2], \uvec[N-2])) \big\} \end{multline}$ This, finally, points us to the recurrence relation promised.

For the system as described at the $$K$$th-to-last stage, the minimal cost is: $\begin{multline} J_{N-K,N}^*\left(\xvec[N-K]\right) = \\ \mathrm{min}_{\{\uvec[N-K], \ldots, \uvec[N]\}} \left\{ h(\xvec[N]) + \sum_{k=N-K}^{N-1} {g\dscrt(\xvec[k],\uvec[k])} \right\} \\ = \mathrm{min}_{\{\uvec[N-K]\}} \big\{ g\dscrt(\xvec[N-K], \uvec[N-K]) \\ {} + J_{N-K+1,N}^*(\avec\dscrt( \xvec[N-K], \uvec[N-K])) \big\} \end{multline}$

Notice also the embedding principle: $$J_{N-K,N}^*(\xvec[N-K])$$ is both the minimum cost for the final $$K$$ stages of the $$N$$-stage process with state $$\xvec[N-K]$$ given, and equally the minimum cost for a $$K$$-stage process with initial state $$\xvec[N-K]$$; i.e., the $$K$$-stage process is embedded in the $$N$$-stage process.

Next time: The Hamilton-Jacobi-Bellman equation is analogous to this recurrence relation, but in the continuous-time case.

Posted in control theory on
Tags: Control Systems and Dynamic Programming.