YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GPT-2.3

GPT-2.3 is an advanced fine-tuned version of the GPT-2 architecture, evolved directly from the Gpt-2.2 iteration.

Key Improvements over 2.2

  • Dataset Size: Trained on a 15% subset of Wikitext-2 (50% increase in training data volume).
  • Context Window: Upgraded to support up to 2048 tokens, allowing for much longer document generation and deeper context tracking.
  • Performance: Further optimized for causal language modeling with improved coherence.

Technical Specifications

  • Base Model: GPT-2 (small)
  • Training Epochs: 3
  • Max Sequence Length: 2048 Tokens
  • Framework: Hugging Face Transformers & PyTorch

Hardware Guidance

  • Inference: While it runs on a CPU, a GPU (e.g., Tesla T4) is highly recommended for generating long-form text (up to 2048 tokens) efficiently.

Usage Example

Users can load and run the model using the following code (note the ignore_mismatched_sizes flag):

from transformers import GPT2LMHeadModel, GPT2Tokenizer

repo_id = "BikoRiko/GPT-2.3"
# Note: ignore_mismatched_sizes=True is required because the context was expanded post-training
model = GPT2LMHeadModel.from_pretrained(repo_id, ignore_mismatched_sizes=True)
tokenizer = GPT2Tokenizer.from_pretrained(repo_id)

# Generate text
inputs = tokenizer("The future of technology is", return_tensors='pt')
outputs = model.generate(**inputs, max_length=150)
print(tokenizer.decode(outputs[0]))
Downloads last month
28
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support