Notes by Sarah Chieng | Reference Video | Reference Slides

Note:

Hi there! It’s Sarah. This page was originally my personal notes on Karpathy’s 1hr “Intro To Large Language Models” video. I thought it was a great thorough and beginner friendly video on LLMs, so I wanted to compile and polish my notes to share :) It’s about a 5 minute read, and of course if you have time, I’d encourage watching the actual video!

A couple of important things, I have:

Re-organized signifiant portions of the content in a way that I believe is easier to follow. For example, Karpathy presents the steps of creating an LLM out of order.
Enriched the content with details he did not mention, that I think are extremely important.
Generalized a lot of examples to provide a better overall understanding. He presents several examples as if they are the “only way to go” which was very confusing and sometimes factually incorrect.
Added my own commentary to point out key points and important takeaways.
Omitted small details that are less-important, over complicate, or overly OpenAI/ChatGPT-biased (97% of video content preserved).

Also, big thanks to Kudzo Ahegbebu and Harper Carroll for providing feedback on the content, factual correctness, and structure of this page :) This document is a work in progress and ideally can become a very helpful resource for someone to learn more about LLMs. Any feedback is appreciated (💌 [email protected])

Table of Contents:

1. What is a large language model (LLM)?

An LLM is a type of neural network that specializes in processing, understanding, and generating human language.

There are two main components of an LLM:

the parameters: primarily refers to the giant set of floating point numbers that make up the model’s weights. However, “parameters” also includes the various settings and configurations that define the language model, including the training data, model size, activation functions, etc.
the code to run the parameters

What does an LLM do?

Given a sequence of words, the LLM predicts the next word.

This requires parameters/ the network to learn a lot about the world
All of this info is compressed into the parameters
**An example of an open-source LLM: [llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b-hf#:~:text=Llama 2 is a collection,the index at the bottom.)**
- components: 70b parameter file + run.c
- model by meta