HuggingFaceTB/smoltalk
Viewer • Updated • 2.2M • 17.3k • 409
How to use applexml/kimi-k2-poc2 with MLX:
# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm
# Generate text with mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("applexml/kimi-k2-poc2")
prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
text = generate(model, tokenizer, prompt=prompt, verbose=True)How to use applexml/kimi-k2-poc2 with MLX LM:
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "applexml/kimi-k2-poc2"
# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "applexml/kimi-k2-poc2"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "applexml/kimi-k2-poc2",
"messages": [
{"role": "user", "content": "Hello"}
]
}'NanoAgent is a compact 135M parameter, 8k context-length language model trained to perform tool calls and generate responses based on tool outputs.
Despite its small size (~135 MB in 8-bit precision), it’s optimized for agentic use cases and runs easily on personal devices.
Github: NanoAgent
Inference resource: link
Base model: SmolLM2-135M-Instruct
Fine-tuning method: Dynamic Fine-Tuning (DFT)
Hardware: Apple Mac M1 (16 GB Unified Memory) using MLX.
microsoft/orca-agentinstruct-1M-v1 — agentic tasks, RAG answers, classification microsoft/orca-math-word-problems-200k — lightweight reasoning allenai/tulu-3-sft-personas-instruction-following — instruction following xingyaoww/code-act — ReAct style reasoning and action m-a-p/Code-Feedback — alignment via feedback HuggingFaceTB/smoltalk + /apigen — tool calling stabilization weijie210/gsm8k_decomposed — question decomposition Locutusque/function-calling-chatml — tool call response structureThis is a beta model.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "quwsarohi/NanoAgent-135M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
def inference(messages, max_new_tokens=256, temperature=0.3, min_p=0.15, **kwargs):
input_text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(
inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
min_p=0.15,
temperature=temperature,
**kwargs
)
return tokenizer.decode(outputs[0][inputs.shape[1] :], skip_special_tokens=True)
messages = [{"role": "user", "content": "Hi! Do you have a name?"}]
print(inference(messages))
Use the following template for tool calling:
TOOL_TEMPLATE = """You are a helpful AI assistant. You have a set of possible functions/tools inside <tools></tools> tags.
Based on question, you may need to make one or more function/tool calls to answer user.
You have access to the following tools/functions:
<tools>{tools}</tools>
For each function call, return a JSON list object with function name and arguments within <tool_call></tool_call> tags."""
Sample tool call definition:
{
"name": "web_search",
"description": "Performs a web search for a query and returns a string of the top search results formatted as markdown with titles, links, and descriptions.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query to perform.",
}
},
"required": ["query"],
},
}
Quantized
Base model
HuggingFaceTB/SmolLM2-135M