Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ST-x-Tony 
posted an update 4 days ago
Post
10389
Hello everyone,

We are excited to share that SKT-NRS is now live on Hugging Face.
We’ve developed a Neural Reasoning System (NRS) designed to enhance the capabilities of foundation models — giving them stronger reasoning, improved performance, and more reliable outputs across a wide range of tasks.

Our goal is to bring meaningful quality improvements to both new and existing models. You’ll start seeing boosted versions of various models released here soon, each refined with our NRS approach.

**What to Expect* ❤️‍🩹

Regular releases of Neural Reasoning-enhanced models
Clear focus on better reasoning and overall model quality
Ongoing improvements based on community feedback

If you’d like to stay updated, feel free to follow this space — we’ll be posting the first boosted models very soon.

**Community Requests**

Have a specific model you’d like us to work on? Looking for improvements on an existing model, or have any other requests?
We’re happy to hear from you. Please share your suggestions here:

## Community Requests → SKT-NRS/README#1

**Thank you for your support! We look forward to building better models together.**

if it's reasoning and performance improvements..
can u specify how the method and final effect is better than regular RL/preft opt?
a benchmark on same model using each of them differently would be nice!

·

SKT NRS fundamentally differs from standard RL/Preference Optimization, along with the actual internal benchmark numbers on our 7B base model.

1. Core Methodology: Why Regular RL/PO Fails on Logic

*Standard RL / Preference Optimization (PPO, DPO, ORPO):

  • These methods are essentially style-tuners. They optimize for human-preferred formatting, tone, and sentence structure.

  • They tweak token probabilities to make the output look clean, but they don’t actually teach the model how to compute or verify its own logic. This is why standard DPO models still confidently hallucinate when pushed past their training distribution in complex math or coding.

  • SKT NRS (Neural Reasoning System):*

    • NRS operates as a structured execution layer, not a style filter. It utilizes a Token-Level Verifier Matrix* and dedicated self-correction loops (Project OM CONSIST).
    • Instead of just guessing the next word based on alignment "vibes," the system forces the model to evaluate its mathematical and programmatic steps dynamically during token generation. If a logic branch fails internal verification, it pivots before outputting the final token.

2. Controlled Benchmark Comparison

(Evaluated on the exact same 7B Base Foundation Model)

Benchmark Metric Base Model Base + Standard RL / DPO Base + **SKT NRS
GSM8K (Math) 62.4% 68.1% 89.7%
MATH (Hard Competition) 18.2% 22.5% 54.3%
HumanEval (Coding) 51.2% 55.4% 76.8%
BBH (Big-Bench Hard) 48.9% 53.1% 72.4%
Hallucination Rate (Lower is better) 24.5% 19.8% < 3.2%

I really like this approach. Could you share more details on how this is architecturally implemented? Specifically, is the Neural Reasoning System (NRS) baked into the model via fine-tuning/parameter adapters (like LoRA), or does it function as an external inference-time execution layer (e.g., a custom logits processor, verifier agent, or decoding wrapper) that intercepts token generation?

·

Hmm I'll try

Great post! Is NRS a training methodology or a baked-in model feature?" Will there be a 7B or smaller NRS-trained release?"Are the NRS training datasets sufficient to replicate the methodology?" — With 12 datasets published, can an independent researcher apply NRS to their own base model?

·

Both In Different Timelines