AI & ML interests

None defined yet.

Shrijanagain 
posted an update about 14 hours ago
view post
Post
39
Excited to launch SKT-ST-X-0-3B by SKT AI Labs! 🚀🇮🇳

​A powerful 3B Parameter Mixture of Experts (MoE) model optimized for high-performance reasoning with a small footprint.


​--> Quick Specs:
> Total Params: ~3B | Active Params: ~1.1B (2 experts/token)
> Pre-trained on 40B tokens (SKT-OMNI-CORPUS-2T)

1.Context: 8K tokens
2.Bilingual: English & Hindi 🇬🇧🇮🇳
3. Base: Built on ST-X-0 with Mixtral stability


​Get 3B intelligence at 1B inference speeds. Fully open-source under Apache-2.0! 👇

​🔗 Try it on Hugging Face: sKT-Ai-Labs/SKT-ST-X-0-3B

​#AI #OpenSource #LLM #MixtureOfExperts #SKTAILabs #MachineLearning
Shrijanagain 
posted an update 15 days ago
view post
Post
2568
We are pleased to announce that the W-IMG Vision Dataset infrastructure is officially live.

The complete asset infrastructure is now accessible on Hugging Face for internal validation and architecture scaling targets.

Dataset Endpoint - sKT-Ai-Labs/W-IMG

#SovereignAI #ComputerVision #MachineLearning #OpenSource
Parveshiiii 
posted an update about 2 months ago
view post
Post
584
🚀 Sonic: A lightweight Python audio processing library with tempo matching, BPM detection, time-stretching, resampling & track blending — now with GPU (CUDA) acceleration for 10x speed!

Perfect for quick remixes, batch edits or syncing tracks.

👉 https://github.com/Parveshiiii/Sonic

#Python #AudioProcessing #OpenSource #PyTorch
Parveshiiii 
posted an update about 2 months ago
view post
Post
1632
Excited to announce my latest open-source release on Hugging Face: Parveshiiii/breast-cancer-detector.

This model has been trained and validated on external datasets to support medical research workflows. It is designed to provide reproducible benchmarks and serve as a foundation for further exploration in healthcare AI.

Key highlights:
- Built for medical research and diagnostic study contexts
- Validated against external datasets for reliability
- Openly available to empower the community in building stronger, more effective solutions

This release is part of my ongoing effort to make impactful AI research accessible through **Modotte**. A detailed blog post explaining the methodology, dataset handling, and validation process will be published soon.

You can explore the model here: Parveshiiii/breast-cancer-detector

#AI #MedicalResearch #DeepLearning #Healthcare #OpenSource #HuggingFace

Shrijanagain 
posted an update 2 months ago
view post
Post
4296
sKT-Ai-Labs


Join fast we will soon published tokens and all join and get started because we will soon off join request button if you want you can join fast guys
  • 1 reply
·
Shrijanagain 
posted an update 2 months ago
view post
Post
2667
​🚀 Bharat AI Revolution ka Hissa Banein! 🇮🇳

​Kya aap Bharat ko AI ki duniya mein ek nayi pehchan dilana chahte hain ?

SKT AI Labs sirf ek naam nahi, ek mission hai—desh ko digital shakti dene ka aur "Viksit Bharat" ke sapne ko sach karne ka.

​Humse Kyun Judein?

​1. Desh ka Apna AI: Hum aise models bana rahe hain jo khas taur par Bharat ki zarooraton aur bhashaon ke liye hain.

​2. Open Collaboration: Hamare Hugging Face repository par hamare kaam ko dekhein, test karein aur apna yogdan dein.

3. Technological Growth: Agar aap student hain, developer hain ya tech enthusiast hain, toh hamare saath naya seekhne aur grow karne ka yeh behtareen mauka hai.

​Join here

sKT-Ai-Labs

🔗
sKT-Ai-Labs


​Aaiye, saath milkar Bharat AI Revolution ko aage badhate hain! 💻🔥

​#SKTAILabs #DigitalIndia #AIRevolution #ViksitBharat #TechInnovation #JoinTheMission
Shrijanagain 
posted an update 2 months ago
Parveshiiii 
posted an update 2 months ago
view post
Post
2954
Just did something I’ve been meaning to try for ages.

In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok — and it hits the same token efficiency as Qwen3.

Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.

Turns out it doesn’t have to be.

microtok makes the whole process stupidly simple — literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.

If you’ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone else’s, this is the entry point you’ve been waiting for.

I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.

Blog → https://parveshiiii.github.io/blogs/microtok/
Trained tokenizer → https://huggingface.co/Parveshiiii/microtok
GitHub repo → https://github.com/Parveshiiii/microtok
Shrijanagain 
posted an update 2 months ago
view post
Post
5651

​We are thrilled to announce the launch of SKT-OMNI-CORPUS-2T, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
​Developed at SKT AI LABS, this corpus is not just a collection of data; it’s a mission to decentralize high-grade AI training for regional languages and global knowledge.

​💎 Key Highlights:

​•• Massive Scale: Targeting a multi-terabyte architecture for 2T-level tokenization.

•• ​Pure Quality: Curated from 500+ Elite Sources

•• ​Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-𝕻 series) for seamless distributed training.

​🤝 Open for Collaboration!

​We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture design—let’s build the future together.

​Explore the Dataset on Hugging Face:

🔗 https://huggingface.co/datasets/Shrijanagain/SKT-OMNI-CORPUS-146T-V1

DSR -- 🔗 https://huggingface.co/datasets/Shrijanagain/SKT-DSRx10000

​#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia
Parveshiiii 
posted an update 4 months ago
view post
Post
350
Introducing Seekify — a truly non‑rate‑limiting search library for Python

Tired of hitting rate limits when building search features? I’ve built Seekify, a lightweight Python library that lets you perform searches without the usual throttling headaches.

🔹 Key highlights

- Simple API — plug it in and start searching instantly

- No rate‑limiting restrictions

- Designed for developers who need reliable search in projects, scripts, or apps

📦 Available now on PyPI:

pip install seekify

👉 Check out the repo: https:/github.com/Parveshiiii/Seekify
I’d love feedback, contributions, and ideas for real‑world use cases. Let’s make search smoother together!
Parveshiiii 
posted an update 4 months ago
view post
Post
1650
🚀 Wanna train your own AI Model or Tokenizer from scratch?

Building models isn’t just for big labs anymore — with the right data, compute, and workflow, you can create **custom AI models** and **tokenizers** tailored to any domain. Whether it’s NLP, domain‑specific datasets, or experimental architectures, training from scratch gives you full control over vocabulary, embeddings, and performance.

✨ Why train your own?
- Full control over vocabulary & tokenization
- Domain‑specific optimization (medical, legal, technical, etc.)
- Better performance on niche datasets
- Freedom to experiment with architectures

⚡ The best part?
- Tokenizer training (TikToken / BPE) can be done in **just 3 lines of code**.
- Model training runs smoothly on **Google Colab notebooks** — no expensive hardware required.

📂 Try out my work:
- 🔗 https://github.com/OE-Void/Tokenizer-from_scratch
- 🔗 https://github.com/OE-Void/GPT
Parveshiiii 
posted an update 4 months ago
view post
Post
272
📢 The Announcement
Subject: XenArcAI is now Modotte – A New Chapter Begins! 🚀

Hello everyone,

We are thrilled to announce that XenArcAI is officially rebranding to Modotte!

Since our journey began, we’ve been committed to pushing the boundaries of AI through open-source innovation, research, and high-quality datasets. As we continue to evolve, we wanted a name that better represents our vision for a modern, interconnected future in the tech space.

What is changing?

The Name: Moving forward, all our projects, models, and community interactions will happen under the Modotte banner.

The Look: You’ll see our new logo and a fresh color palette appearing across our platforms.

What is staying the same?

The Core Team: It’s still the same people behind the scenes, including our founder, Parvesh Rawal.

Our Mission: We remain dedicated to releasing state-of-the-art open-source models and datasets.

Our Continuity: All existing models, datasets, and projects will remain exactly as they are—just with a new home.

This isn’t just a change in appearance; it’s a commitment to our next chapter of growth and discovery. We are so grateful for your ongoing support as we step into this new era.

Welcome to the future. Welcome to Modotte.

Best regards, The Modotte Team
Parveshiiii 
posted an update 5 months ago
view post
Post
3605
Hey everyone!
We’re excited to introduce our new Telegram group: https://t.me/XenArcAI

This space is built for **model builders, tech enthusiasts, and developers** who want to learn, share, and grow together. Whether you’re just starting out or already deep into AI/ML, you’ll find a supportive community ready to help with knowledge, ideas, and collaboration.

💡 Join us to:
- Connect with fellow developers and AI enthusiasts
- Share your projects, insights, and questions
- Learn from others and contribute to a growing knowledge base

👉 If you’re interested, hop in and be part of the conversation: https://t.me/XenArcAI
  • 12 replies
·
Parveshiiii 
posted an update 7 months ago
view post
Post
1679
Another banger from XenArcAI! 🔥

We’re thrilled to unveil three powerful new releases that push the boundaries of AI research and development:

🔗 https://huggingface.co/XenArcAI/SparkEmbedding-300m

- A lightning-fast embedding model built for scale.
- Optimized for semantic search, clustering, and representation learning.

🔗 https://huggingface.co/datasets/XenArcAI/CodeX-7M-Non-Thinking

- A massive dataset of 7 million code samples.
- Designed for training models on raw coding patterns without reasoning layers.

🔗 https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking

- A curated dataset of 2 million code samples.
- Focused on reasoning-driven coding tasks, enabling smarter AI coding assistants.

Together, these projects represent a leap forward in building smarter, faster, and more capable AI systems.

💡 Innovation meets dedication.
🌍 Knowledge meets responsibility.


Parveshiiii 
posted an update 7 months ago
view post
Post
3072
SparkEmbedding - SoTA cross lingual retrieval

Iam very happy to announce our latest embedding model sparkembedding-300m base on embeddinggemma-300m we fine tuned it on 1m extra examples spanning over 119 languages and result is this model achieves exceptional cross lingual retrieval

Model: https://huggingface.co/XenArcAI/SparkEmbedding-300m
Parveshiiii 
posted an update 8 months ago
view post
Post
234
AIRealNet - SoTA - Image detection model

We’re proud to release AIRealNet — a binary image classifier built to detect whether an image is AI-generated or a real human photograph. Based on SwinV2 and fine-tuned on the AI-vs-Real dataset, this model is optimized for high-accuracy classification across diverse visual domains.

If you care about synthetic media detection or want to explore the frontier of AI vs human realism, we’d love your support. Please like the model and try it out. Every download helps us improve and expand future versions.

Model page: https://huggingface.co/XenArcAI/AIRealNet
Parveshiiii 
posted an update 8 months ago
view post
Post
4514
Ever wanted an open‑source deep research agent? Meet Deepresearch‑Agent 🔍🤖

1. Multi‑step reasoning: Reflects between steps, fills gaps, iterates until evidence is solid.

2. Research‑augmented: Generates queries, searches, synthesizes, and cites sources.

3. Fullstack + LLM‑friendly: React/Tailwind frontend, LangGraph/FastAPI backend; works with OpenAI/Gemini.


🔗 GitHub: https://github.com/Parveshiiii/Deepresearch-Agent
Parveshiiii 
posted an update 8 months ago
view post
Post
3128
🚀 Big news from XenArcAI!

We’ve just released our new dataset: **Bhagwat‑Gita‑Infinity** 🌸📖

✨ What’s inside:
- Verse‑aligned Sanskrit, Hindi, and English
- Clean, structured, and ready for ML/AI projects
- Perfect for research, education, and open‑source exploration

🔗 Hugging Face: https://huggingface.co/datasets/XenArcAI/Bhagwat-Gita-Infinity

Let’s bring timeless wisdom into modern AI together 🙌
Parveshiiii 
posted an update 8 months ago
view post
Post
2474
🚀 New Release from XenArcAI
We’re excited to introduce AIRealNet — our SwinV2‑based image classifier built to distinguish between artificial and real images.

✨ Highlights:
- Backbone: SwinV2
- Input size: 256×256
- Labels: artificial vs. real
- Performance: Accuracy 0.999 | F1 0.999 | Val Loss 0.0063

This model is now live on Hugging Face:
👉 https://huggingface.co/XenArcAI/AIRealNet

We built AIRealNet to push forward open‑source tools for authenticity detection, and we can’t wait to see how the community uses it.
Parveshiiii 
posted an update 10 months ago
view post
Post
1125
🚀 Just Dropped: MathX-5M — Your Gateway to Math-Savvy GPTs

👨‍🔬 Wanna fine-tune your own GPT for math?
🧠 Building a reasoning agent that actually *thinks*?
📊 Benchmarking multi-step logic across domains?

Say hello to [**MathX-5M**](https://huggingface.co/datasets/XenArcAI/MathX-5M) — a **5 million+ sample** dataset crafted for training and evaluating math reasoning models at scale.

Built by **XenArcAI**, it’s optimized for:
- 🔍 Step-by-step reasoning with , , and formats
- 🧮 Coverage from arithmetic to advanced algebra and geometry
- 🧰 Plug-and-play with Gemma, Qwen, Mistral, and other open LLMs
- 🧵 Compatible with Harmony, Alpaca, and OpenChat-style instruction formats

Whether you're prototyping a math tutor, testing agentic workflows, or just want your GPT to solve equations like a pro—**MathX-5M is your launchpad**.

🔗 Dive in: (https://huggingface.co/datasets/XenArcAI/MathX-5M)

Let’s make open-source models *actually* smart at math.
#FineTuneYourGPT #MathX5M #OpenSourceAI #LLM #XenArcAI #Reasoning #Gemma #Qwen #Mistral