Hands-on-code

手写大模型组件之Group Query Attention，从 MHA，MQA 到 GQA

了解注意力机制变体，包括MHA（Multi-Head Attention）、MQA（Multi-Query Attention）和GQA（Group Query Attention）。通过手写代码实现，探讨三种注意力机制的异同，以及GQA在推理性能优化方面的优势。

2024年12月08日 hands-on-code transformer LLM

LoRA 原理和 PyTorch 代码实现

用 PyTorch 实现从零实现 LoRA, 理解 LoRA 的原理，主要是为了展示一个 LoRA 实现的细节

2024年11月09日 hands-on-code transformer LLM

手写 transformer decoder（CausalLM）

手写一个 Causal Language Model，或者说简化版的 transformer 中的 decoder。

2024年08月18日 hands-on-code transformer LLM

手写 Self-Attention 的四重境界，从 self-attention 到 multi-head self-attention

在 AI 相关的面试中，经常会有面试官让写 self-attention，但是 transformer 这篇文章其实包含很多的细节，因此可能面试官对于 self-attention 实现到什么程度是有不同的预期。因此这里想通过写不同版本的 self-attention 实现来达到不同面试官的预期，四个不同的版本，对应不同的细节程度。

2024年08月18日 hands-on-code transformer LLM

Hands-on-code (4 篇文章)

手写大模型组件之Group Query Attention，从 MHA，MQA 到 GQA

LoRA 原理和 PyTorch 代码实现

手写 transformer decoder（CausalLM）

手写 Self-Attention 的四重境界，从 self-attention 到 multi-head self-attention