图表内容
目表8:Transformer模型结构示意
Output
Probabilities
Add Norm
Feed
Forward
Add Norm
Add Norm
Multi-Head
Feed
Attention
Forward
Nx
Nx
Add Norm
Add Norm
Masked
Multi-Head
Multi-Head
Attention
Attention
Positional
Positional
Encoding
Encoding
Output
Embedding
Embedding
Inputs
Outputs
(shifted right)
料来源:《Attention is all youneed》
东方财富证券研究所