hewking.top

hewking's blog

View the Project on GitHub hewking/blog

llm 原理

Author: hewking
Labels: blog
Created: 2025-02-13T13:44:25Z
Link and comments: https://github.com/hewking/blog/issues/52

llm 原理

LLM 预训练

步骤:

  1. 数据集
  2. token 序列化
  3. 预训练:神经网络训练
  4. 推理

AI News

  1. AI News Newsletter
  2. AHEAD OF Ai
  3. lilianweng blogs

参考

  1. Andrej Karpatht: Deep Dive into LLMs like ChatGPT
    1. HuggingFaceFW/fineweb: 数据集
    2. Transformer Neural Net 3D visualizer LLM 可视化
    3. token 序列化示例
    4. karpathy 复现 GPT-2 示例
    5. Llama 3 paper from Meta
    6. hyperbolic: 云端大模型 demo 和 api
    7. HuggingFace inference playground
    8. DeepSeek-R1 paper
    9. TogetherAI Playground for open model inference
    10. LM Arena for model rankings
    11. AI News Newsletter
    12. LMStudio for local inference
  2. 最好的致敬是学习:DeepSeek-R1 赏析
    1.  R1 训练流程图
    2. R1 训练流程图@刘囧
    3. Understanding Reasoning LLMs
  3. Understanding Large Language Models
    1. karpathy/nanoGPT
    2. The Transformer Family
    3. The Illustrated Transformer
    4. Attention Is All You Need
  4. Intro to Large Language Models
    1. 模型文件本质:参数文件(pytorch: .pt 文件) + run.c 运行代码
    2. 参数文件:有损压缩互联网数据为神经网络
    3. 神经网络:数据之间会形成规律
    4. 预训练:BaseModal
    5. 微调: Assistant Modal
    6. 模型的幻觉:(模型输出时,类似在做梦)
    7. RL: HFRL, RL
    8. Thingking , System1, System2
    9. 多模态
    10. Scaling Law
    11. 大模型的未来
    12. 模型安全
    13. 视频资源
      1. karpathy/llama2.c
  5. Tramsforms 原理
    1. Attention Is All You Need
    2. [Attention in transformers, step-by-step DL6](https://www.youtube.com/watch?v=eMlx5fFNoYc)
    3. [Visualizing transformers and attention Talk for TNG Big Tech Day ‘24](https://www.youtube.com/watch?v=KJtZARuO3JY)
  6. Understanding DeepSeek-R1
    1. GRPO 是如何工作的?
    2. Transformer 论文逐段精读【论文精读】