Instructions to use khyeom/SVSTR-Score with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use khyeom/SVSTR-Score with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("khyeom/SVSTR-Score", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| """SV-SPR quickstart: three ways to score SV calls.""" | |
| from svspr import classify, score, SVSPR | |
| REF = '/path/to/GRCh38.fa' # β edit me | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # 1) Single SV β fastest demo, returns dict | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| result = classify( | |
| chrom='chr1', pos=1000000, end=1005000, | |
| svtype='DEL', svlen=5000, total_alt_support=15, | |
| ref_path=REF, | |
| ) | |
| print('Single SV β', result) | |
| # {'CS': 0.69..., 'tier': 'moderate'} | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # 2) Whole VCF β returns DataFrame | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| df = score(vcf_path='examples/sample.vcf', ref_path=REF) | |
| print(df[['chrom', 'pos', 'svtype', 'svlen', 'CS', 'tier']].head()) | |
| # Filter: keep only high-confidence calls | |
| high = df[df.tier == 'high'] | |
| print(f'{len(high):,} of {len(df):,} calls passed high-confidence filter') | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # 3) Reuse model across many VCFs β cheaper than calling `score` repeatedly | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| model = SVSPR() # load once | |
| for vcf in ['cohort_01.vcf', 'cohort_02.vcf', 'cohort_03.vcf']: | |
| out = model.predict_vcf(vcf, REF) | |
| out.to_csv(vcf.replace('.vcf', '.scored.tsv'), sep='\t', index=False) | |