Scam detection β€” NLP (ML project)

Python project layout for training and serving a scam / phishing / coercion text classifier (multilingual can be added later via model/dataset choice).

Layout

scam-nlp-ml/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/          # Original CSVs, dumps, exports (gitignored contents β€” keep samples elsewhere)
β”‚   └── processed/    # Train/val splits, tokenized cache
β”œβ”€β”€ models/           # Checkpoints, exported ONNX/Torch artifacts
β”œβ”€β”€ src/              # Training, evaluation, data pipeline code
β”œβ”€β”€ api/              # Optional FastAPI inference service
β”œβ”€β”€ notebooks/        # EDA and experiments
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md

Quick start

  1. Create a virtual environment (Python 3.10+ recommended):

    python -m venv .venv
    
  2. Activate:

    • Windows: .venv\Scripts\activate
    • macOS/Linux: source .venv/bin/activate
  3. Install dependencies:

    pip install -U pip
    pip install -r requirements.txt
    
  4. Environment:

    copy .env.example .env
    # Edit .env with your paths and hyperparameters
    
  5. Place raw datasets under data/raw/, then implement preprocessing in src/ (add modules as you build).

Notes

  • Do not commit secrets or large raw datasets; use .env and optional .gitignore rules for data/raw/* and models/* if needed.
  • For India-focused scams (e.g. digital-arrest SMS), ensure your labels and evaluation reflect those patterns; consider a multilingual encoder (e.g. xlm-roberta-base) when you expand languages.

Next steps (implementation)

  • src/data.py β€” load, clean, split
  • src/train.py β€” fine-tune transformers
  • src/eval.py β€” metrics (precision/recall on scam class)
  • api/main.py β€” POST /predict with text body

This repository scaffold only creates the folders and baseline config; add those modules as you iterate.

Downloads last month
10
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results