Summarization
Transformers
PyTorch
TensorFlow
JAX
Rust
Safetensors
English
bart
text2text-generation
Eval Results (legacy)
Instructions to use facebook/bart-large-cnn with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/bart-large-cnn with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="facebook/bart-large-cnn")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn") model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn") - Inference
- Notebooks
- Google Colab
- Kaggle
Maximum number of input tokens
#83
by sushant-nair - opened
Hi,
Can someone please tell what is the maximum number of tokens that can be input to this model?
Thanks.
i think it's 1024
Is there any parameter through which we can find the maximum length of input text for the model which we are using?
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
print(model.config.max_position_embeddings) # => 1024
So YES 1024 is max input tokens..
To check your input text token length:
from transformers import AutoTokenizer
text_to_summarize = "Put here your long text..."
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
tokens = tokenizer.encode(text_to_summarize, truncation=False)
print(f"My long input text has: {len(tokens)} tokens")
Slice the input text to specific token length - not count of characters nor count words
from transformers import pipeline
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
max_input_tokens = model.config.max_position_embeddings
text_to_summary = "Your long text..."
tokens = tokenizer.encode(text_to_summary, truncation=False)
if len(tokens) > max_input_tokens:
tokens = tokens[:max_input_tokens-1]
text_to_summary = tokenizer.decode(tokens, skip_special_tokens=True)
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
result = summarizer(text_to_summary, max_length=150, min_length=10, do_sample=False)
print(result[0]['summary_text'])