GPT-2.3

GPT-2.3 is an advanced fine-tuned version of the GPT-2 architecture, evolved directly from the Gpt-2.2 iteration.

Key Improvements over 2.2

Dataset Size: Trained on a 15% subset of Wikitext-2 (50% increase in training data volume).
Context Window: Upgraded to support up to 2048 tokens, allowing for much longer document generation and deeper context tracking.
Performance: Further optimized for causal language modeling with improved coherence.

Technical Specifications

Base Model: GPT-2 (small)
Training Epochs: 3
Max Sequence Length: 2048 Tokens
Framework: Hugging Face Transformers & PyTorch

Hardware Guidance

Inference: While it runs on a CPU, a GPU (e.g., Tesla T4) is highly recommended for generating long-form text (up to 2048 tokens) efficiently.

Usage Example

Users can load and run the model using the following code (note the ignore_mismatched_sizes flag):

from transformers import GPT2LMHeadModel, GPT2Tokenizer

repo_id = "BikoRiko/GPT-2.3"
# Note: ignore_mismatched_sizes=True is required because the context was expanded post-training
model = GPT2LMHeadModel.from_pretrained(repo_id, ignore_mismatched_sizes=True)
tokenizer = GPT2Tokenizer.from_pretrained(repo_id)

# Generate text
inputs = tokenizer("The future of technology is", return_tensors='pt')
outputs = model.generate(**inputs, max_length=150)
print(tokenizer.decode(outputs[0]))

Downloads last month: 28

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support