Position: Ph.D. Candidate
Current Institution: University of Southern California
Percentile Policies for Tracking of Markovian Random Processes with Asymmetric Cost andObservation
Motivated by wide-ranging applications such as video delivery over networks using Multiple Description Codes (MDP), congestion control, rate adaptation, spectrum sharing, provisioning of renewable energy, inventory management and retail, we study the state-tracking of a Markovian random process with a known transition matrix and a finite ordered state set. The decision-maker must select a state as an action at each time step in order to minimize the total expected (discounted) cost. The decision-maker is faced with asymmetries both in cost and observation:
in case the selected state is less than the actual state of the Markovian process, an under-utilization cost occurs and only partial observation about the actual state (i.e. an implicit lower bound characteristic on the actual state) is revealed; otherwise, the decision incurs an over-utilization cost and reveals full information about the actual state. We can formulate this problem as a Partially Observable Markov Decision Process (POMDP) which can be expressed as a dynamic program (DP) based on the last full observed state and the time of full observation. This formulation determines the sequence of actions to be taken between any two consecutive full observations of the actual state, in order to minimize the total expected (discounted) cost. However, this DP grows exponentially, with little hope for a computationally feasible solution. We present an interesting class of computationally tractable policies with a percentile threshold structure. Among all percentile policies, we search for the one with the minimum expected cost. The result of this search is a heuristic policy which we evaluate through numerical simulations. We show that it outperforms the myopic policies and under some conditions performs close to the optimal policies. Furthermore, we derive a lower bound on the cost of the optimal policy which can be computed with low complexity and give a measure for how close our heuristic policy is to the optimal policy.
Parisa Mansourifard is a PhD candidate at Electrical Engineering department in University ofSouthern California. She received her Bacholar of Science and Master of Science in ElectricalEngineering from Sharif University of Technology, Iran, in 2008 and 2010, respectively. Shejoined University of Southern California in 2011 with Viterbi fellowship where she is currentlypursuing a Ph.D. degree. She also received her second Master of Science in Computer Sciencefrom University of Southern California in 2015. She held American Association of universitieswomen (AAUW) dissertation fellowship for 2015-2016. Her research interests include decisionmaking, stochastic control and optimization, and intersection of optimization and learningtheory. In her research, she aims to solve critical optimization problems in various networks suchas inventory management or communication networks where there is a mismatch betweendemands and resources causing undesired costs.