#atom

The discrete units of text that language models process

Core Idea: Tokens are the fundamental units that language models use to process text, representing words, parts of words, or individual characters that are converted into numerical values for model processing.

Key Elements

Tokenization Process

Token Characteristics

Technical Implementation

Practical Considerations

Connections

References

  1. Tiktoken documentation (OpenAI's tokenizer)
  2. "Attention Is All You Need" paper (introducing transformer models that process tokens)
  3. OpenAI API documentation on token counting and limits

#LLM #tokens #tokenization #NLP #vocabulary


Connections:


Sources: