Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model
Abstract
The sample complexity of learning an ε-optimal policy in Stochastic Shortest Path problems is analyzed, showing that learning is strictly harder than in finite-horizon and discounted settings when minimum cost is zero.
We study the sample complexity of learning an ε-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with S states, A actions, minimum cost c_{min}, and maximum expected cost of the optimal policy over all states B_{star}, where any algorithm requires at least Ω(SAB_{star}^3/(c_{min}ε^2)) samples to return an ε-optimal policy with high probability. Surprisingly, this implies that whenever c_{min} = 0 an SSP problem may not be learnable, thus revealing that learning in SSPs is strictly harder than in the finite-horizon and discounted settings. We complement this lower bound with an algorithm that matches it, up to logarithmic factors, in the general case, and an algorithm that matches it up to logarithmic factors even when c_{min} = 0, but only under the condition that the optimal policy has a bounded hitting time to the goal state.
Get this paper in your agent:
hf papers read 2604.16111 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper