Instructions to use vishal1d/falcon-lora-imdb with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vishal1d/falcon-lora-imdb with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="vishal1d/falcon-lora-imdb")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("vishal1d/falcon-lora-imdb", dtype="auto") - PEFT
How to use vishal1d/falcon-lora-imdb with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use vishal1d/falcon-lora-imdb with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "vishal1d/falcon-lora-imdb" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vishal1d/falcon-lora-imdb", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/vishal1d/falcon-lora-imdb
- SGLang
How to use vishal1d/falcon-lora-imdb with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "vishal1d/falcon-lora-imdb" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vishal1d/falcon-lora-imdb", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "vishal1d/falcon-lora-imdb" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vishal1d/falcon-lora-imdb", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use vishal1d/falcon-lora-imdb with Docker Model Runner:
docker model run hf.co/vishal1d/falcon-lora-imdb
- Model Card for Model ID
- 🦅 Falcon LoRA - IMDb Sentiment Generation
- LoRA adapter model ID on Hugging Face Hub
- Load the adapter configuration
- Load the base Falcon model
- Load the LoRA adapter on top of the base model
- Load the tokenizer
- Create a text generation pipeline
- Example prompt
- Display the generated text
Model Card for Model ID
🦅 Falcon LoRA - IMDb Sentiment Generation
This model is a LoRA fine-tuned version of tiiuae/falcon-rw-1b using the IMDb movie review dataset.
It's trained to generate sentiment-rich movie review completions from short prompts. LoRA (Low-Rank Adaptation) enables efficient fine-tuning with fewer resources.
Model Details
Base Model: Falcon RW 1B (tiiuae/falcon-rw-1b)
- Fine-Tuning Method: Parameter-Efficient Fine-Tuning (LoRA via PEFT)
- Dataset: IMDb (1000 samples for demonstration)
- Input Length: 128 tokens
- Training Framework: 🤗 Transformers + PEFT
- Trained on: Google Colab (T4 GPU)
Model Description
- Developed by: Vishal D.
- Shared on Hugging Face Hub:
vishal1d/falcon-lora-imdb - Model Type: Causal Language Model (AutoModelForCausalLM)
- Language(s): English
- License: Apache 2.0
- Finetuned From:
tiiuae/falcon-rw-1b
Direct Use
You can use this model for:
- Generating sentiment-aware movie reviews
- NLP educational experiments
- Demonstrating LoRA fine-tuning in Transformers
Downstream Use [optional]
This model can serve as a base for:
- Continued fine-tuning on other text datasets
- Training custom sentiment generation apps
- Teaching parameter-efficient fine-tuning methods
Out-of-Scope Use
Avoid using this model for:
- Real-world sentiment classification (it generates, not classifies)
- Medical, legal, or safety-critical decision-making
- Non-English text (not trained or evaluated for multilingual use)
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM from peft import PeftModel, PeftConfig
LoRA adapter model ID on Hugging Face Hub
adapter_id = "vishal1d/falcon-lora-imdb"
Load the adapter configuration
peft_config = PeftConfig.from_pretrained(adapter_id)
Load the base Falcon model
base_model = AutoModelForCausalLM.from_pretrained( peft_config.base_model_name_or_path, trust_remote_code=True, device_map="auto" )
Load the LoRA adapter on top of the base model
model = PeftModel.from_pretrained(base_model, adapter_id) model.eval()
Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(peft_config.base_model_name_or_path, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token
Create a text generation pipeline
generator = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100, do_sample=True, temperature=0.8, top_k=50, top_p=0.95 )
Example prompt
prompt = "The movie was absolutely wonderful because" output = generator(prompt)
Display the generated text
print(output[0]["generated_text"])
Training Details
- LoRA Config:
r=8lora_alpha=16lora_dropout=0.1target_modules=["query_key_value"]
- Batch Size: 2 (with gradient_accumulation=4)
- Epochs: 1 (demo purpose)
- Precision: FP16
- Training Samples: 1000 IMDb reviews
Training Data
The model was fine-tuned on the IMDb dataset, a large-scale dataset containing 50,000 movie reviews labeled as positive or negative.
For demonstration and quick experimentation, only 1000 samples from the IMDb train split were used.
Dataset Card: IMDb on Hugging Face
Format: Text classification (binary sentiment)
Preprocessing:
Tokenized using tiiuae/falcon-rw-1b tokenizer
Max input length: 128 tokens
Labels were set as input_ids for causal language modeling
Training Procedure
Preprocessing Tokenized each review using Falcon's tokenizer
Truncated/padded to max length of 128
Used causal language modeling: labels = input_ids (predict next token)
Training Hyperparameters Model: tiiuae/falcon-rw-1b
Fine-tuning method: LoRA (Low-Rank Adaptation) via PEFT
LoRA Config:
r=8, lora_alpha=16, lora_dropout=0.1
Target module: "query_key_value"
Training Args:
per_device_train_batch_size=2
gradient_accumulation_steps=4
num_train_epochs=1
fp16=True
Frameworks: 🤗 Transformers, PEFT, Datasets, Trainer
Speeds, Sizes, Times GPU used: Google Colab (Tesla T4, 16GB)
Training time: ~10–15 minutes for 1 epoch on 1000 samples
Checkpoint size (adapter only): ~6.3 MB (adapter_model.safetensors)
Testing Data, Factors & Metrics
Testing Data
Evaluation was done interactively using text prompts. No quantitative metrics were used since the model was trained for demo-scale.
Factors
Prompt completion
Sentiment alignment
Fluency of generated text
Metrics
Evaluation was qualitative, based on prompt completions. Since this model was trained on only 1000 IMDb samples for demonstration, we evaluated it by:
Text Coherence: Does the output form grammatically valid sentences?
Sentiment Appropriateness: Does the generated output reflect the sentiment implied by the prompt?
Relevance: Is the continuation logically connected to the prompt?
No quantitative metrics (like accuracy, BLEU, ROUGE) were computed due to the generative nature of the task.
Results
The model successfully generated fluent, sentiment-aware text completions for short prompts like:
Prompt: "The movie was absolutely wonderful because" Output: "...it had brilliant performances, touching moments, and a truly powerful story that left the audience in awe."
These results show that the model can be useful for sentiment-rich text generation, even with limited training data.
Summary
Even with only 1000 IMDb samples, the model can produce sentiment-aligned completions.
LoRA fine-tuning was efficient and lightweight.
Best used for experimentation or small-scale inference.
Technical Specifications [optional]
Model architecture: Falcon-RW-1B (decoder-only transformer)
Fine-tuning: LoRA (Low-Rank Adaptation)
Precision: Mixed precision (fp16)
Tokenizer: tiiuae/falcon-rw-1b tokenizer
Frameworks Used: Hugging Face Transformers, Datasets, PEFT
Model Architecture and Objective
This model uses the tiiuae/falcon-rw-1b architecture, which is a decoder-only transformer similar to GPT. The objective is causal language modeling, where the model predicts the next token given all previous tokens.
During fine-tuning, Low-Rank Adaptation (LoRA) was used to efficiently adjust a small number of weights (via low-rank updates) while keeping the base model frozen.
Compute Infrastructure
Hardware
Hardware GPU: NVIDIA Tesla T4 (16 GB VRAM)
Platform: Google Colab
Software
Software Python Version: 3.10
PyTorch: 2.7.1
Transformers: 4.52.4
PEFT: 0.15.2
BitsAndBytes: 0.46.0 (if used for quantization)
Model Card Authors [optional]
Vishal D. – Model fine-tuning and publication
Based on Falcon-RW-1B by TII UAE ]
Model Card Contact
📧 Email: tvishal810@gmail,com
🧠Hugging Face: vishal1d
Model tree for vishal1d/falcon-lora-imdb
Base model
tiiuae/falcon-rw-1b