A collection with text-classification and token-classification models for PII Protection
Alvaro Bartolome
AI & ML interests
machine learning + tech lead @huggingface (inference + cloud)
Recent Activity
updated a dataset about 4 hours ago
huggingface/DEH-image-scan-data liked a Space about 18 hours ago
davanstrien/dataset-lineage-explorer posted an update 6 days ago
Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!
> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio
https://alvarobartt.com/agents-on-aws-sagemakerOrganizations
Critique Models (CM) on the 🤗 Hub
This collection contains some Critique Models (CM) for LLM evaluation available in the HuggingFace Hub
-
openbmb/UltraCM-13b
Text Generation • Updated • 244 • • 20 -
prometheus-eval/prometheus-7b-v1.0
Text Generation • Updated • 759 • • 31 -
prometheus-eval/prometheus-13b-v1.0
Text Generation • Updated • 1.06k • • 145 -
prometheus-eval/prometheus-7b-v2.0
Text Generation • 7B • Updated • 24.8k • • 107
AIF Datasets (with distilabel)
Small to medium size datasets either: synthetically generated, labelled with AI Feedback (AIF), or both
NER in Spanish
Fine-tuned models to perform NER in Spanish using the framework SpanMarker and different encoders and datasets
-
alvarobartt/bert-base-multilingual-cased-ner-spanish
Token Classification • 0.2B • Updated • 25 • 3 -
alvarobartt/span-marker-xlm-roberta-large-conll-2002-es
Token Classification • Updated • 9 • 2 -
alvarobartt/span-marker-roberta-base-bne-conll-2002-es
Token Classification • Updated • 8 • 1
From zero to GPT-hero
Reading list to fully understand GPT (and GPT-2) and to be able to implement those from scratch
-
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
Attention Is All You Need
Paper • 1706.03762 • Published • 122 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
Generating Wikipedia by Summarizing Long Sequences
Paper • 1801.10198 • Published • 3
Studio Ghibli Diffusion
Text-To-Image fine-tunes with Studio Ghibli style
- Running on ZeroAgents23
FLUX.1 Studio Ghibli LoRA
🖼23Generate Studio Ghibli-style images from text prompts
-
alvarobartt/ghibli-characters
Viewer • Updated • 9 • 116 • 9 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 716k • • 12.9k -
alvarobartt/ghibli-characters-flux-lora
Text-to-Image • Updated • 308 • • 64
About ORPO
Contains some information and experiments fine-tuning LLMs using 🤗 `trl.ORPOTrainer`
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 73 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation • 141B • Updated • 44 • 270 -
alvarobartt/mistral-orpo-mix
Text Generation • 7B • Updated • 4 • 1 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation • 7B • Updated • 12 • • 14
Apple MLX-compatible 7B LLMs on the 🤗 Hub
This collection contains the model weights for 7B LLMs for Apple's MLX framework. Find more information at https://github.com/ml-explore/mlx
🇪🇸 Datasets in Spanish for LLM Evaluation
This collection contains some datasets for LLM evaluation in Spanish, from nlp.uoregon.edu, translated using ChatGPT (including English counterparts)
🔒 Models for PII Protection
A collection with text-classification and token-classification models for PII Protection
Studio Ghibli Diffusion
Text-To-Image fine-tunes with Studio Ghibli style
- Running on ZeroAgents23
FLUX.1 Studio Ghibli LoRA
🖼23Generate Studio Ghibli-style images from text prompts
-
alvarobartt/ghibli-characters
Viewer • Updated • 9 • 116 • 9 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 716k • • 12.9k -
alvarobartt/ghibli-characters-flux-lora
Text-to-Image • Updated • 308 • • 64
Critique Models (CM) on the 🤗 Hub
This collection contains some Critique Models (CM) for LLM evaluation available in the HuggingFace Hub
-
openbmb/UltraCM-13b
Text Generation • Updated • 244 • • 20 -
prometheus-eval/prometheus-7b-v1.0
Text Generation • Updated • 759 • • 31 -
prometheus-eval/prometheus-13b-v1.0
Text Generation • Updated • 1.06k • • 145 -
prometheus-eval/prometheus-7b-v2.0
Text Generation • 7B • Updated • 24.8k • • 107
About ORPO
Contains some information and experiments fine-tuning LLMs using 🤗 `trl.ORPOTrainer`
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 73 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation • 141B • Updated • 44 • 270 -
alvarobartt/mistral-orpo-mix
Text Generation • 7B • Updated • 4 • 1 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation • 7B • Updated • 12 • • 14
AIF Datasets (with distilabel)
Small to medium size datasets either: synthetically generated, labelled with AI Feedback (AIF), or both
Apple MLX-compatible 7B LLMs on the 🤗 Hub
This collection contains the model weights for 7B LLMs for Apple's MLX framework. Find more information at https://github.com/ml-explore/mlx
NER in Spanish
Fine-tuned models to perform NER in Spanish using the framework SpanMarker and different encoders and datasets
-
alvarobartt/bert-base-multilingual-cased-ner-spanish
Token Classification • 0.2B • Updated • 25 • 3 -
alvarobartt/span-marker-xlm-roberta-large-conll-2002-es
Token Classification • Updated • 9 • 2 -
alvarobartt/span-marker-roberta-base-bne-conll-2002-es
Token Classification • Updated • 8 • 1
🇪🇸 Datasets in Spanish for LLM Evaluation
This collection contains some datasets for LLM evaluation in Spanish, from nlp.uoregon.edu, translated using ChatGPT (including English counterparts)
From zero to GPT-hero
Reading list to fully understand GPT (and GPT-2) and to be able to implement those from scratch
-
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
Attention Is All You Need
Paper • 1706.03762 • Published • 122 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
Generating Wikipedia by Summarizing Long Sequences
Paper • 1801.10198 • Published • 3