ENS Appraiser v0.2

A gradient-boosted regressor that predicts the USD sale price of an ENS (.eth) domain name from on-chain history, semantic embeddings of the label, and macro-market context.

This is the v0 baseline — handcrafted features + mpnet PCA + KNN comparable-sale aggregates. Built to establish an honest, leakage-free floor that future versions improve on.

Quick numbers

Trained on ~265k ENS secondary sales (Jan 2022 – Sep 2023), evaluated on 2,744 sales in Q1–Q2 2024 (held out by date, never seen during training):

Split n R² (log USD) RMSE (log USD) Median APE Bias
Train 265,240 0.7700 0.7744 32.5% +0.000
Val 3,545 0.6602 1.0678 57.0% +0.203
Test 2,744 0.3081 1.5469 138.3% +0.732

Plain-English read: for a typical mid-tier name in test, the model is within ~2× of the actual sale price. The long tail — celebrity names, 3-letter premiums, regime shifts — is where it misses, often by 100×+ in either direction.

What's good

  • Mid-tier names, $50–$5,000 range: usually within 2× of actual.
  • Length and character composition: strong signals captured well. The model knows 3-letter ASCII names are premium and 12-letter random handles are cheap.
  • Wordlist hits: matches against Wikipedia, GeoNames, US first names, stock tickers, and SEC EDGAR are picked up correctly. paris.eth is flagged as a city, nike.eth as a brand.
  • Comparable-sale anchoring: the top two features are knn_mean_log and knn_p90_log — the model leans heavily on "what did similar names sell for recently?" which is the right intuition for valuation.

What's not

  • Celebrity / brand premium: a name's value to a known buyer (Coinbase wanting coinbase.eth, a luxury brand wanting their mark) is invisible to this model. It can detect that nike.eth is a brand word, but not that the sale price reflects Nike's interest specifically.
  • 3-letter premium tail: names like mph.eth, uma.eth sold for $20k–$40k in test; the model predicted $100–$200. The training set underweights short premiums because most sales there are 5+ letters.
  • Regime shift on test: test set median price is ~4× higher than training median due to the 2023 → 2024 ENS market shift. Recency-weighted training (1-year half-life) helps but doesn't fully close the gap.
  • Bidirectional errors: worst predictions split roughly evenly between under-prediction (hot names the model didn't recognize) and over-prediction (cold names that just didn't move). 138% MedianAPE is honest but uncomfortable.

How it's built

Component Detail
Algorithm XGBoost regressor (170 boosted trees, max_depth=7)
Target log(sale_price_usd)
Features 146 total
Training data 265,240 sales, Jan 2022 – Sep 2023
Training time ~10 min on a single A100
Model size 3.3 MB

Feature breakdown

  • Handcrafted (15): length, n_digits, n_letters, n_special, palindrome, is_all_digits, is_all_letters, is_ascii, has_unicode, starts/ends_digit, max_char_run, n_unique_chars
  • Wordlist hits (8): Wikipedia titles, GeoNames cities, US first names, ISO 3166 countries, stock tickers, SEC EDGAR companies, Wiktionary EN, plus a wordlist_hits total
  • Grails clubs (~45): binary membership in each curated .eth club (999club, pre-punks, palindromes, pokemon_gen1, etc.)
  • Trademark conflict (1): active USPTO mark in Nice classes 9, 35, 36, 38, 41, 42, 45 with matching mark_text_norm
  • Holder behavior (2): name_age_days, prior_transfer_count (leakage-safe — only counts transfers strictly before the sale block)
  • Macro context (5): Fear & Greed Index, ETH chain TVL, ETH stablecoin market cap, ETH DEX volume, total NFT marketplace fees on the sale day
  • mpnet PCA (64): 768-dim all-mpnet-base-v2 embeddings of the label, PCA-reduced to 64 dims (95% explained variance)
  • KNN comparable sales (8): for each label, FAISS-retrieve top-50 semantic neighbors (HNSW index), filter near-duplicates (sim > 0.999), take the most-recent prior sale of each, aggregate as knn_count, knn_mean_log, knn_median_log, knn_p90_log, knn_max_sim, knn_min_sim, knn_log_max, knn_log_min. Strict leakage prevention: only neighbors with sales before the current sale's date count.

Top 10 features by gain

Rank Feature Gain
1 knn_mean_log 1,714
2 knn_p90_log 1,613
3 len 1,364
4 in_wikipedia 1,052
5 is_all_digits 944
6 knn_median_log 604
7 n_digits 338
8 pca_000 289
9 n_clubs 282
10 ends_digit 277

Five of the top ten are KNN-comp or PCA features, which means the embedding pipeline is doing real work — it's not just paying for itself, it's the dominant signal alongside length.

Training data + leakage controls

Built from the quantumly/ens-appraiser-data dataset:

  • Sales labels: Alchemy getNFTSales for ENS BaseRegistrar + NameWrapper contracts. Wei amounts converted to USD via CoinGecko hourly OHLC at the sale's block timestamp. Coverage gap: Alchemy getNFTSales v2 truncates at block 19,768,978 (May 2024) and does not index Blur marketplace sales. v0 ships with this gap; closing it is a v1 priority.
  • Registrations + transfers: The Graph's ENS subgraph.
  • Wordlists: Wiktionary dumps, Wikipedia EN article titles, GeoNames cities500, US Census baby names, NASDAQ Trader ticker dumps, SEC EDGAR company tickers, ISO 3166 country list.
  • Macro: alternative.me Fear & Greed Index, DefiLlama (TVL, stablecoin mcap, DEX volume, NFT marketplace fees).
  • Trademarks: USPTO Trademark Case Files Dataset (annual research dump).
  • Embeddings: sentence-transformers/all-mpnet-base-v2, encoded once for all 3.5M ENS labels in the dataset.

Leakage controls

The first version of this model accidentally leaked future information through lifetime_transfer_count (it counted all transfers ever for a labelhash, including transfers that happened after the sale being predicted). The leaky model showed train R² 0.81 / test R² −0.29 — the classic catastrophic-overfit signature where the model collapses to predicting the population mean on held-out data.

The current model uses prior_transfer_count, which only counts transfers where transfer_block < sale_block per row. It moved to rank #11 in feature importance (was #1 by 3.3×). KNN comparable-sale features have a similar safeguard: a neighbor's sale only counts if it happened strictly before the sale being predicted.

Train/Val/Test split

Fixed-window temporal split:

  • Train: sales with sale_date < 2023-10-01
  • Val: sales 2023-10-01 → 2023-12-31
  • Test: sales 2024-01-01 onwards

This prevents the v0.1 mistake of training on 2022 prices and asking the model to extrapolate to a 2024 market regime that's ~4× more expensive on average. Val and test are in the same regime so val RMSE is a meaningful proxy for test.

Training rows are weighted with an exponential recency decay (1-year half-life, normalized to mean=1.0) so the model leans on 2023 dynamics without throwing away the older data entirely.

Intended use

This model is intended for research and analytics, not as a price oracle and not for live trading.

Reasonable uses:

  • Bulk valuation of mid-tier ENS portfolios for tax/accounting purposes
  • Identifying obviously over- or under-listed names on secondary markets
  • Sanity-checking a listing price before posting
  • Producing comparable-sale ranges for negotiation context

Out of scope:

  • Pricing 3-letter, 1-2 letter, or otherwise-premium names with confidence
  • Pricing celebrity / known-brand names where the buyer pool is concentrated
  • Predicting prices for names in the post-May-2024 marketplace mix (Blur dominance, marketplace fee changes)
  • Any high-stakes financial decision based on a single point estimate

Limitations

  • Sales coverage: Jan 2022 – May 2024 only, no Blur. ~2 years of recent sales (mid-2024 onwards) are missing entirely from training. Closing this gap requires either a new sales source (Reservoir/SimpleHash both defunct as of 2024–2025) or direct eth_getLogs decoding of Seaport, Blur, X2Y2, LooksRare events, planned for v1.
  • Celebrity premium: there's no feature here for "is this a famous person/place/thing?" beyond Wikipedia-title matching. v1 adds LLM-derived structured features (fame_score, name_kind, crypto_relevance, brand_collision_risk) which should close most of this gap.
  • Out-of-distribution labels: pure-digit labels (0001), punycode/emoji, and l33tspeak get less benefit from mpnet embeddings since they're out of distribution for the pretrained model. Length and charset features partially compensate.
  • Time drift: the ENS market shifts noticeably every 6–12 months as marketplace dominance, fee structures, and DAO actions move. Predictions on names sold "right now" will lag any regime shift since the training cutoff.
  • Test-set thinness: only 2,744 sales meet the $10 floor and post-Jan-2024 cutoff. The reported test R² has roughly ±0.08 95% CI — useful as a ballpark, not a precise number.

How to use

from huggingface_hub import hf_hub_download
import xgboost as xgb
import pickle

model_path = hf_hub_download(
    repo_id="quantumly/ens-appraiser",
    filename="v0_appraiser_xgb.json",
)
pca_path = hf_hub_download(
    repo_id="quantumly/ens-appraiser",
    filename="v0_pca_mpnet.pkl",
)

booster = xgb.Booster()
booster.load_model(model_path)
with open(pca_path, "rb") as f:
    pca = pickle.load(f)

# Inference also requires:
#  1. mpnet embedding for the label (sentence-transformers/all-mpnet-base-v2)
#  2. Handcrafted/wordlist/club/trademark/holder/macro features
#  3. KNN comp lookup against the dataset repo's FAISS index
#
# A self-contained inference notebook is planned in the dataset repo.

The 146 features expected by the booster are listed in v0_metadata.json under feature_cols, in the exact order required by xgb.DMatrix.

Reproducibility

The training notebook (v0_appraiser_v2.ipynb) runs end-to-end on a Colab A100 high-RAM instance in ~25 minutes:

  1. Downloads all source parquets from the dataset repo
  2. Reconstructs USD prices via CoinGecko hourly OHLC join
  3. Resolves labels for both BaseRegistrar and NameWrapper sales
  4. Computes all features
  5. Builds HNSW index for KNN
  6. Trains XGBoost with early stopping
  7. Saves model + metadata + diagnostics
  8. Uploads to this model repo

All randomness is seeded (seed=42 for XGBoost, PCA, sample weights).

Roadmap

v1 priorities (in expected R² delta order):

  1. LLM-derived features — Llama 3.1 8B local inference over all 3.5M labels, extracting fame_score, name_kind, cultural_origin, crypto_relevance, brand_collision_risk, plus a description-embedding. Expected delta: +0.05–0.10 test R².
  2. Recent sales backfill via direct eth_getLogs decoding of Seaport / Blur / Wyvern / X2Y2 / LooksRare events. Closes the May 2024 → present coverage gap and adds Blur. Expected delta: +0.03–0.06 test R² and a much bigger test set.
  3. Multi-embedding ensemble — concatenate mpnet with bge-base-en-v1.5 and e5-base-v2, PCA the joint space. Expected delta: +0.02–0.04.
  4. Cross-encoder reranker for KNN comps. Expected delta: +0.02–0.03.
  5. Contrastive fine-tuning of mpnet on price-similarity triplets. Expected delta: +0.03–0.05.

Citation

@misc{ens_appraiser_2026,
  author    = {Drobnič, Nejc},
  title     = {ENS Appraiser v0.2},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/quantumly/ens-appraiser}
}

License + contact

MIT. Questions, corrections, pull requests: nejc@nejc.dev

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for quantumly/ens-appraiser

Finetuned
(370)
this model

Dataset used to train quantumly/ens-appraiser

Collection including quantumly/ens-appraiser

Evaluation results

  • R² (log USD, test) on ENS Appraiser Multi-source Training Data
    self-reported
    0.308
  • Median APE (test) on ENS Appraiser Multi-source Training Data
    self-reported
    1.383
  • RMSE (log USD, test) on ENS Appraiser Multi-source Training Data
    self-reported
    1.547