Papers
arxiv:2604.16111

Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model

Published on Apr 17
Authors:
,
,
,

Abstract

The sample complexity of learning an ε-optimal policy in Stochastic Shortest Path problems is analyzed, showing that learning is strictly harder than in finite-horizon and discounted settings when minimum cost is zero.

AI-generated summary

We study the sample complexity of learning an ε-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with S states, A actions, minimum cost c_{min}, and maximum expected cost of the optimal policy over all states B_{star}, where any algorithm requires at least Ω(SAB_{star}^3/(c_{min}ε^2)) samples to return an ε-optimal policy with high probability. Surprisingly, this implies that whenever c_{min} = 0 an SSP problem may not be learnable, thus revealing that learning in SSPs is strictly harder than in the finite-horizon and discounted settings. We complement this lower bound with an algorithm that matches it, up to logarithmic factors, in the general case, and an algorithm that matches it up to logarithmic factors even when c_{min} = 0, but only under the condition that the optimal policy has a bounded hitting time to the goal state.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.16111
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.16111 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.16111 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.