YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
GPT-2.3
GPT-2.3 is an advanced fine-tuned version of the GPT-2 architecture, evolved directly from the Gpt-2.2 iteration.
Key Improvements over 2.2
- Dataset Size: Trained on a 15% subset of Wikitext-2 (50% increase in training data volume).
- Context Window: Upgraded to support up to 2048 tokens, allowing for much longer document generation and deeper context tracking.
- Performance: Further optimized for causal language modeling with improved coherence.
Technical Specifications
- Base Model: GPT-2 (small)
- Training Epochs: 3
- Max Sequence Length: 2048 Tokens
- Framework: Hugging Face Transformers & PyTorch
Hardware Guidance
- Inference: While it runs on a CPU, a GPU (e.g., Tesla T4) is highly recommended for generating long-form text (up to 2048 tokens) efficiently.
Usage Example
Users can load and run the model using the following code (note the ignore_mismatched_sizes flag):
from transformers import GPT2LMHeadModel, GPT2Tokenizer
repo_id = "BikoRiko/GPT-2.3"
# Note: ignore_mismatched_sizes=True is required because the context was expanded post-training
model = GPT2LMHeadModel.from_pretrained(repo_id, ignore_mismatched_sizes=True)
tokenizer = GPT2Tokenizer.from_pretrained(repo_id)
# Generate text
inputs = tokenizer("The future of technology is", return_tensors='pt')
outputs = model.generate(**inputs, max_length=150)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support