Abstract
Consumption Markov decision processes (CMDPs) are a kind of resource-constrained probabilistic decision-making systems allowing the consumption and replenishment to capacity of resource. Making each decision in CMDPs consumes some amount of resource, which can be reloaded to the full capacity in a specified set of states. We consider here the quantitative reachability probability controller synthesis problems for CMDPs that extract a policy preventing resource exhaustion and achieve (repeated) reachability goals with the maximal probability, in contrast to the original work focusing on the qualitative controller synthesis for CMDPs, e.g., fulfilling the (repeated) reachability goals with positive probability or probability one. We first prove that the exact quantitative reachability probability problems are NP-hard by a reduction from the 0–1 Knapsack problem. We solve them by converting a given CMDP to a flattened Markov decision process (MDP) and then constructing an optimal policy for that MDP. Unfortunately, the naive flattened MDPs are prohibitively large and thus make the subsequent synthesis not scale well. To overcome this drawback, we propose a pruned construction and a quotient construction of the flattened MDP to reduce the sizes of MDPs. The empirical evaluation shows that our optimizations can significantly improve the scalability of the purely flattening-based synthesis method for CMDPs on benchmarks from real-world scenarios — autonomous electric vehicle routing and helicopter planning in Mars Perseverance Rover project.
| Original language | English |
|---|---|
| Article number | 106342 |
| Journal | Information Processing Letters |
| Volume | 180 |
| DOIs | |
| State | Published - Feb 2023 |
Keywords
- Markov decision process
- NP-hardness
- Quantitative analysis
- Reachability
- Scheduling