#atom

The foundational neural network design powering modern language models

Core Idea: The Transformer is a neural network architecture based on self-attention mechanisms that enables parallel processing of sequences, allowing models to capture complex dependencies in language without using recurrence or convolution.

Key Elements

Core Components

Architectural Variants

Computational Characteristics

Evolution and Optimizations

Additional Connections

References

  1. Vaswani, A., et al. (2017). Attention Is All You Need. NeurIPS.
  2. Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

#transformer #deep-learning #neural-networks #attention-mechanism #language-models


Connections:


Sources: