Instructions to use facebook/bart-large-cnn with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/bart-large-cnn with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="facebook/bart-large-cnn")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn") model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn") - Inference
- Notebooks
- Google Colab
- Kaggle
IndexError: index out of range in self
I get this error when using the example code. The last line in the stack trace is this:
Lib\site-packages\torch\nn\functional.py", line 2267, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
The only thing I changed is that I used a longer input text. I think it is too long. How to fix? Can I set the maximum length somehow?
I was facing the same issue; I solved it by slicing the input into two pieces, summarizing each of them in larger texts, merging both of them, and then summarizing one more time. The problem is that I think a lot of information was lost.
That's right. I usually break my input text into chunks of 500 tokens to resolve this.
def chunk_text_with_context(text, context, max_tokens=500):
words = text.split()
chunks = []
current_chunk = [context]
current_length = len(tokenizer.encode(context, add_special_tokens=False))
for word in words:
word_length = len(tokenizer.encode(word, add_special_tokens=False))
if current_length + word_length <= max_tokens:
current_chunk.append(word)
current_length += word_length
else:
chunks.append(" ".join(current_chunk))
current_chunk = [context, word]
current_length = len(tokenizer.encode(context, add_special_tokens=False)) + word_length
# Add the last chunk if there's any
if current_chunk:
chunks.append(" ".join(current_chunk))
return chunks
The above is code for if you want to append a certain context to each chunk.