Maximum number of input tokens

#83

by sushant-nair - opened Oct 9, 2024

Discussion

sushant-nair

Oct 9, 2024

Hi,
Can someone please tell what is the maximum number of tokens that can be input to this model?
Thanks.

Actionless

Oct 24, 2024

i think it's 1024

pulkitsinghall

Nov 9, 2024

Is there any parameter through which we can find the maximum length of input text for the model which we are using?

baksapeter

Apr 11, 2025

from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
print(model.config.max_position_embeddings) # => 1024

So YES 1024 is max input tokens..

To check your input text token length:

from transformers import AutoTokenizer

text_to_summarize = "Put here your long text..."
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
tokens = tokenizer.encode(text_to_summarize, truncation=False)
print(f"My long input text has: {len(tokens)} tokens")

Slice the input text to specific token length - not count of characters nor count words

from transformers import pipeline
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer


model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
max_input_tokens = model.config.max_position_embeddings

text_to_summary = "Your long text..."

tokens = tokenizer.encode(text_to_summary, truncation=False)

if len(tokens) > max_input_tokens:
    tokens = tokens[:max_input_tokens-1]
    text_to_summary = tokenizer.decode(tokens, skip_special_tokens=True)

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
result = summarizer(text_to_summary, max_length=150, min_length=10, do_sample=False)
print(result[0]['summary_text'])

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment