Model Card: oneai-1-35m

Model Description

oneai-1-35m is a small, conversational decoder-only (GPT-style) language model with 35 million parameters, trained from scratch on the DailyDialog dataset. It is designed for simple, two-way conversations.

Model Architecture

  • Type: Pure Transformer decoder (GPT-style).
  • Number of Parameters: ~35.43 million.
  • d_model (Embedding Size): 512
  • num_layers (Number of Transformer Blocks): 6
  • num_heads (Number of Attention Heads): 8
  • ffn_dim (Feed-Forward Network Hidden Dimension): 2048
  • max_seq_len (Maximum Sequence Length): 256
  • Tokenization: BPE (Byte-Pair Encoding) with a vocabulary size of 16000.

Training Data

The model was trained on the English version of the DailyDialog dataset. The DailyDialog dataset contains everyday dialogues on various topics, characterized by a simple conversational structure. The model was trained on prompt-response pairs created from these dialogues, where the prompt is a "Human" utterance and the response is the next "Assistant" utterance.

Intended Use

  • Main Application: Generating short, simple responses in a dialogical context, mimicking the training data.
  • Example Use Cases: Chatbots for very simple, transactional interactions, prototyping conversational models, research on the effectiveness of small LLMs trained from scratch.

Limitations and Ethical Considerations

  • Conversation Quality: Due to its small size and training from scratch on a limited dataset, the model often generates nonsensical, out-of-context, or repetitive responses. It lacks a deep understanding of language or the world.
  • Lack of General Knowledge: The model does not possess general world knowledge and cannot infer beyond what it learned from the DailyDialog dataset.
  • Potential Hallucinations: It may generate content that appears plausible but is untrue or makes no sense.
  • Bias: The model might reflect biases present in the training data, although in the case of the DailyDialog dataset, the risk is lower due to its neutral character. Nevertheless, there is always a risk of generating content that might be inappropriate or discriminatory.
  • Language: The model was created based on data in English, so its conversational abilities will be limited to that language. Any change of language, even based on the English version of the DailyDialog dataset, will likely result in a reduction in the quality of generated dialogues.

How to Use

The model is available on Hugging Face and can be loaded using the transformers library after defining the custom architecture (GPTModel, TransformerBlock, CausalSelfAttention).

Downloads last month
2,007
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train onedevelopment/oneai-1-35m

Space using onedevelopment/oneai-1-35m 1