Transformer
How GPT learns layer by layer