MT5_large_A_art

This model is a fine-tuned version of ai-forever/sage-mt5-large on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3.83229e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 3300

Training Loss	Epoch	Step	Validation Loss
0.9979	0.0303	100	0.2649
0.5176	0.0606	200	0.2170
0.3916	0.0909	300	0.1973
0.3356	0.1212	400	0.1928
0.2993	0.1515	500	0.1937
0.2783	0.1818	600	0.1919
0.268	0.2121	700	0.1907
0.2697	0.2424	800	0.1914
0.2491	0.2726	900	0.1901
0.2488	0.3029	1000	0.1888
0.238	0.3332	1100	0.1861
0.2414	0.3635	1200	0.1872
0.2378	0.3938	1300	0.1857
0.2286	0.4241	1400	0.1842
0.2201	0.4544	1500	0.1849
0.2217	0.4847	1600	0.1845
0.2195	0.5150	1700	0.1835
0.2137	0.5453	1800	0.1818
0.2147	0.5756	1900	0.1822
0.2246	0.6059	2000	0.1806
0.2151	0.6362	2100	0.1806
0.2179	0.6665	2200	0.1805
0.2219	0.6968	2300	0.1806
0.2126	0.7271	2400	0.1808
0.2149	0.7573	2500	0.1802
0.2137	0.7876	2600	0.1806
0.2146	0.8179	2700	0.1803
0.2078	0.8482	2800	0.1803
0.2084	0.8785	2900	0.1805
0.2153	0.9088	3000	0.1801
0.2134	0.9391	3100	0.1799
0.2169	0.9694	3200	0.1799
0.2181	0.9997	3300	0.1799

Safetensors

Model size

1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

(1)

this model