Friday, November 16, 2018
Mechanical Engineering Building, MC102
5 King's College Road
A Unified Framework for Sequential Decision Analytics
The problem of making sequential decisions under uncertainty spans a wide range of applications that arise in science, engineering, business, transportation, energy, health and finance. In contrast with deterministic optimization that enjoys a widely used canonical framework, the academic study of decisions under uncertainty is a fragmented field. Communities working on these problems (broadly known as stochastic optimization) include operations research (stochastic programming, Markov decision processes, simulation optimization, decision analysis, bandit problems), computer science (reinforcement learning, bandit problems), optimal control (stochastic control, model predictive control, online computation), and applied mathematics (stochastic search). We refer collectively to these communities as the “jungle of stochastic optimization.”
In this seminar, I will outline a mathematical framework for modeling all of these problem classes. This framework consists of five fundamental elements (states, decisions/actions/ controls, exogenous information, transition function and objective function), and requires optimizing over policies, which is the major point of departure from deterministic optimization. We divide solution strategies for sequential problems (“dynamic programs”) between searching over a class of functions (“policy search”) and policies based on lookahead approximations (which includes Bellman’s equation, model predictive control and stochastic programming). We further divide each of these two fundamental solution approaches into two subclasses, producing four (meta)classes of policies for approaching sequential stochastic optimization problems. We claim that these classes are universal, in that a solution to any sequential decision problem will be one of these classes of policies, or a hybrid drawn from two or more classes.
We illustrate these classes using a variety of applications, and close by demonstrating that each of these four classes (or a hybrid) may work best depending on the data.