Try it — tap the card
What does LLM stand for?
92% likely forgottenTap to see the answer
Large Language Model
Tap to flip back
Name three foundation models (2025)
Last reviewed 12 days agoTap to see the answer
GPT-4o, Claude Sonnet 4.6, Gemini 2.5 Pro
Tap to flip back
How are LLM APIs typically priced?
87% likely forgottenTap to see the answer
Per token — separate rates for input tokens and output tokens.
Tap to flip back
What is a token in the context of LLMs?
Fading — due 5 days agoTap to see the answer
A sub-word unit of text. Roughly 1 token ≈ 0.75 English words (or ~4 characters).
Tap to flip back
What is the context window of an LLM?
79% likely forgottenTap to see the answer
The maximum number of tokens (input + output) the model can process in a single request. Ranges from 4K to 1M+ tokens.
Tap to flip back
What is the difference between open-source and closed-source LLMs?
Last reviewed 21 days agoTap to see the answer
Closed (GPT-4, Claude): API-only, highest capability, no infra needed.
Open (Llama, Mistral): self-hostable, customizable, data stays on your servers.
Tap to flip back
Name two popular open-source LLMs
94% likely forgottenTap to see the answer
Llama 4 (Meta) and Mistral Large (Mistral AI)
Tap to flip back
What is temperature in LLM generation?
Fading — due 9 days agoTap to see the answer
Controls randomness. Low (0-0.3) = deterministic/focused. High (0.7-1.0) = creative/varied.
Tap to flip back
What paper introduced the transformer architecture?
85% likely forgottenTap to see the answer
"Attention Is All You Need" (2017) by Vaswani et al.
Tap to flip back
What is the key innovation of the transformer?
Last reviewed 17 days agoTap to see the answer
The self-attention mechanism — lets the model weigh relationships between all positions in a sequence in parallel, replacing sequential processing.
Tap to flip back
Why did transformers replace RNNs/LSTMs?
Transformers process all tokens in parallel (not sequentially), enabling much faster training on GPUs and better long-range dependencies.
Sign up to reveal the answer and start learning
What are embeddings?
Dense vector representations of data (text, images) in continuous space. Similar items have nearby vectors.
Sign up to reveal the answer and start learning
What is cosine similarity?
Measures the angle between two vectors. 1.0 = identical direction, 0 = orthogonal, -1.0 = opposite. Used to compare embeddings.
Sign up to reveal the answer and start learning
+ 674 more cards — sign up to study them all with spaced repetition