new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jul 3

PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework

As machine learning (ML) enters high-stakes domains, trustworthy uncertainty quantification (UQ) is essential for safety. In this paper we introduce PCS-UQ, a framework based on the Predictability, Computability, and Stability (PCS) principles for veridical data science. Starting with a candidate set of models or algorithms, PCS-UQ integrates a rigorous prediction-check to screen out unsuitable models in the set and utilizes bootstrap samples, in order to capture both inter-sample variability and algorithmic instability for the prediction-checked algorithms. We then introduce a novel multiplicative calibration scheme to enhance local adaptivity, which basically corresponds to a new score in conformal prediction. Moreover, we produce a compilation of 17 real-world regression datasets with manually-constructed subgroups. On this benchmark, PCS-UQ maintains the target coverage while outperforming or matching conformal methods equipped with oracle-selected algorithms in interval width. PCS-UQ achieves consistent subgroup coverage, outperforming these oracle-selected conformal methods. Notably, PCS-UQ stands out in achieving both competitive interval widths and consistent subgroup coverage.Across 6 classification datasets, PCS-UQ reduces prediction set sizes by 20\%. To scale the framework for deep learning, we propose computationally efficient variants that bypass expensive retraining. On three computer vision benchmarks, these variants reduce prediction set sizes by 20\% over conformal baselines. Finally, we provide theoretical proof that a modified PCS-UQ algorithm preserves valid coverage under exchangeability as a form of split conformal inference.

  • 6 authors
·
Jun 9

Accelerometry-Derived Digital Biomarkers for Cardiometabolic Risk: A Population-Representative Tabular Benchmark with Uncertainty Quantification

Structured tabular data dominates clinical medicine, yet existing benchmarks fail to reflect real-world properties like complex survey sampling, demographic oversampling, and subgroup fairness. We introduce the NHANES Accelerometry Cardiometabolic Benchmark, derived from NHANES 2003-2006, comprising 1,381 adults with hip-worn accelerometry, fasting laboratory biomarkers, dietary intake, and anthropometrics. We evaluate three tabular learning methods -- ridge regression, XGBoost, and the foundation model TabPFN v2 -- to predict glycated haemoglobin (HbA1c), fasting triglycerides, and C-reactive protein (CRP) from activity phenotypes and lifestyle covariates. TabPFN v2 achieves the best overall performance (HbA1c R^2=0.156, CRP R^2=0.383), while triglycerides remain largely unpredictable (R^2 < 0.05), consistent with known genetic dominance. We apply split conformal prediction to generate distribution-free 90% prediction intervals and evaluate demographic coverage equity across sex and race/ethnicity subgroups. Marginal coverage aligns with the 90% target for CRP and HbA1c but falls below for triglycerides. At the subgroup level, we observe localized undercoverage (e.g., HbA1c for Mexican American participants), illustrating the gap between marginal guarantees and the conditional coverage required for clinical fairness. Code and data are at https://github.com/felizzi/nhanes-accel-cardiometabolic-benchmark.

  • 1 authors
·
Jun 28

Refining Focus in AI for Lung Cancer: Comparing Lesion-Centric and Chest-Region Models with Performance Insights from Internal and External Validation

Background: AI-based classification models are essential for improving lung cancer diagnosis. However, the relative performance of lesion-level versus chest-region models in internal and external datasets remains unclear. Purpose: This study evaluates the performance of lesion-level and chest-region models for lung cancer classification, comparing their effectiveness across internal Duke Lung Nodule Dataset 2024 (DLND24) and external (LUNA16, NLST) datasets, with a focus on subgroup analyses by demographics, histology, and imaging characteristics. Materials and Methods: Two AI models were trained: one using lesion-centric patches (64,64,64) and the other using chest-region patches (512,512,8). Internal validation was conducted on DLND24, while external validation utilized LUNA16 and NLST datasets. The models performances were assessed using AUC-ROC, with subgroup analyses for demographic, clinical, and imaging factors. Statistical comparisons were performed using DeLongs test. Gradient-based visualizations and probability distribution were further used for analysis. Results: The lesion-level model consistently outperformed the chest-region model across datasets. In internal validation, the lesion-level model achieved an AUC of 0.71(CI: 0.61-0.81), compared to 0.68(0.57-0.77) for the chest-region model. External validation showed similar trends, with AUCs of 0.90(0.87-0.92) and 0.81(0.79-0.82) on LUNA16 and NLST, respectively. Subgroup analyses revealed significant advantages for lesion-level models in certain histological subtypes (adenocarcinoma) and imaging conditions (CT manufacturers). Conclusion: Lesion-level models demonstrate superior classification performance, especially for external datasets and challenging subgroups, suggesting their clinical utility for precision lung cancer diagnostics.

  • 1 authors
·
Nov 25, 2024