No One Knows the State of the Art in Geospatial Foundation Models
Abstract
Geospatial foundation models lack standardized evaluation and reporting practices, creating inconsistency in performance comparisons and limiting reproducibility across studies.
Geospatial foundation models (GFMs) have been proposed as generalizable backbones for disaster response, land-cover mapping, food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is in geospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, or pretraining controls well enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release no model weights. This lack of community standards can be solved. We propose six concrete expectations: named-license weight release, shared core evaluations, copied-versus-rerun baseline annotations, variance reporting, one shared evaluation harness, and data-vs-architecture-vs-algorithm controls. These gaps are a coordination failure, not a fault of any individual lab; the authors of this paper, like many others in the GFM community, have contributed to them. Rather than just critiquing the community, we aim to provide concrete steps toward a shared understanding of how to innovate GFMs.
Community
GFM papers can't be meaningfully compared because evals, weights, and pretraining configs are all over the place. Our 152-paper audit found 46 same-model/benchmark disagreements of 10+ points and 94/126 papers using unique pretraining setups. The paper proposes six concrete fixes (weight release, shared evals, baseline annotations, variance reporting, one harness, data-vs-arch-vs-algo controls) framed as a coordination problem the whole community owns, not a callout.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GeoSANE: Learning Geospatial Representations from Models, Not Data (2026)
- Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data (2026)
- GeoMeld: Toward Semantically Grounded Foundation Models for Remote Sensing (2026)
- HighFM: Towards a Foundation Model for Learning Representations from High-Frequency Earth Observation Data (2026)
- Agentic AI for Remote Sensing: Technical Challenges and Research Directions (2026)
- Location Is All You Need: Continuous Spatiotemporal Neural Representations of Earth Observation Data (2026)
- GeoR-Bench: Evaluating Geoscience Visual Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.12678 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper