ChuT Compressed Model โ Post-Training Static Quantization (Imaan)
Compressed version of the ChuT Audio CNN using Post-Training Static Quantization (INT8).
Author: Nakintu Imaan
Technique: Post-Training Static Quantization (PTQ)
Original model: TB42project/Late_fusion_model
Results
| Model | AUROC | Size | AUROC Drop |
|---|---|---|---|
| Original FP32 | 0.7923 | 3.8 MB | โ |
| Quantized INT8 | 0.6756 | 1.47 MB | 0.1167 |
How to Use
import torch
import torch.nn as nn
import torch.quantization
import joblib
import librosa
import numpy as np
# Define ImprovedTBCNN with QuantStub/DeQuantStub
# (see quantization.ipynb for full class definition)
# Load quantized bundle
bundle = joblib.load('quantized_fusion_model.pkl')
# Rebuild quantized model structure
model = ImprovedTBCNN(dropout=0.4)
model.eval()
torch.quantization.fuse_modules(model, [...], inplace=True)
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)
torch.quantization.convert(model, inplace=True)
model.load_state_dict(bundle['audio_model_state'])
model.eval()
# Preprocess audio
audio, _ = librosa.load('cough.wav', sr=22050, duration=5)
audio = np.pad(audio, (0, max(0, 22050*5 - len(audio))))[:22050*5]
mel = librosa.feature.melspectrogram(y=audio, sr=22050, n_mels=128, n_fft=2048, hop_length=512)
log_mel = librosa.power_to_db(mel, ref=np.max)
log_mel = (log_mel - log_mel.min()) / (log_mel.max() - log_mel.min() + 1e-8)
mel_tensor = torch.FloatTensor(log_mel).unsqueeze(0).unsqueeze(0)
# Predict
with torch.no_grad():
prob = torch.sigmoid(model(mel_tensor)).item()
print('TB probability:', prob)
Quantization Method
- Type: Post-Training Static Quantization (no retraining)
- Backend: fbgemm (x86) / qnnpack (ARM)
- Calibration: 200 audio samples
- Fused: Conv + BN + ReLU layers
โ ๏ธ Research screening tool only โ not a clinical diagnostic device.