████████╗ ██████╗ ██████╗  ██████╗██╗  ██╗██╗     ███████╗███████╗████████╗
╚══██╔══╝██╔═══██╗██╔══██╗██╔════╝██║  ██║██║     ██╔════╝██╔════╝╚══██╔══╝
   ██║   ██║   ██║██████╔╝██║     ███████║██║     █████╗  █████╗     ██║
   ██║   ██║   ██║██╔══██╗██║     ██╔══██║██║     ██╔══╝  ██╔══╝     ██║
   ██║   ╚██████╔╝██║  ██║╚██████╗██║  ██║███████╗███████╗███████╗   ██║
   ╚═╝    ╚═════╝ ╚═╝  ╚═╝ ╚═════╝╚═╝  ╚═╝╚══════╝╚══════╝╚══════╝   ╚═╝

Welcome to TorchLeet. The best collection of PyTorch practice problems for ML/AI interviews based off real engineer interviews. NEW: LLM Learning Path (build from scratch) + Basics + Advanced lists. Filter by company with the 'company' command or in the web UI. Type help to get started or exit for webview. Track your progress with 'done <id>' and 'progress'.

torchleet:~$

█

Grind Attention for ML/AI Interviews

Follow the guided LLM Learning Path to implement a model from scratch, browse curated Basics and Advanced lists, or set up the AI Tutor for interactive coaching. Many exercises are tagged with the real companies that ask them in interviews.

0 problems|0+ companies|0 LLM path exercises

Set Up AI Tutor

0/ 75

Solved 0 of 75

0% complete

Easy0/22

Medium0/21

Hard0/32

Questions from real interviews at

GoogleAnthropicMetaOpenAIDeepMindNVIDIAAppleTeslaMistralxAIPerplexityMidjourneyDatabricksCohereGoogleAnthropicMetaOpenAIDeepMindNVIDIAAppleTeslaMistralxAIPerplexityMidjourneyDatabricksCohere

Filter by company

Guided Curriculum

Implement LLM from Scratch

23 / 23 shown

1. Foundations (Tokenization & Position)

5 exercises

Implement Byte Pair Encoding from Scratch

Build the BPE tokenizer algorithm that iteratively merges frequent character pairs to build a subword vocabulary.

Easy

#10

Implement Sinusoidal Embeddings

Build the fixed sinusoidal positional encoding from 'Attention Is All You Need' using sin/cos at different frequencies.

Medium

#11

Implement ROPE Embeddings

Build Rotary Position Embeddings that encode relative positions by rotating query and key vectors in complex space.

Medium

Coming Soon

Implement RMS Norm

Build Root Mean Square Layer Normalization used in LLaMA and modern transformers — simpler and faster than LayerNorm.

Easy

Implement Attention from Scratch

Build scaled dot-product attention from raw matrix operations — queries, keys, values, scaling, and softmax.

Medium

2. Core Transformer Building Blocks

4 exercises

Implement Multi-Head Attention from Scratch

Split attention into multiple heads with independent projections, compute attention per head, and concatenate results.

Medium

Implement Grouped Query Attention from Scratch

Build GQA where multiple query heads share key-value heads, reducing KV cache memory while preserving quality.

Medium

#12

Implement KV Cache for Autoregressive Generation

Cache key and value tensors from previous timesteps so each new token only computes attention over one new position.

Medium

AnthropicOpenAIMetaPerplexityTogether AI

#13

Implement Sliding Window Attention

Restrict attention to a fixed local window around each token, reducing memory from O(n²) to O(n·w) for long sequences.

Medium

MistralAnthropicGoogleDeepMind

3. Full Small Language Model

1 exercise

#12

Implement SmolLM from Scratch

Build a complete small language model end-to-end: tokenizer integration, transformer blocks, and autoregressive text generation.

Hard

4. Alignment & Efficient Fine-Tuning

6 exercises

Coming Soon

Implement KL Divergence Loss

Compute KL divergence between two probability distributions from scratch, essential for VAEs and knowledge distillation.

Easy

#11

Implement LoRA on a Linear Layer

Inject trainable low-rank decomposition matrices (A, B) into a frozen linear layer for parameter-efficient fine-tuning.

Medium

MetaGoogleAnthropicOpenAIDatabricks

#20

Coming Soon

Apply SFT on SmolLM

Fine-tune a language model on instruction-following data using supervised fine-tuning with cross-entropy loss.

Hard

#14

Implement DPO Loss from Scratch

Compute the Direct Preference Optimization loss that trains a policy directly from preference pairs without a reward model.

Hard

AnthropicOpenAIDeepMindMeta

#15

Implement PPO for RLHF

Build Proximal Policy Optimization with clipped surrogate objective, value function baseline, and KL penalty for RLHF.

Hard

AnthropicOpenAIDeepMindMeta

#27

Implement GRPO (DeepSeek-R1 Algorithm)

Build Group Relative Policy Optimization that scores multiple completions per prompt and uses group-relative advantages.

Expert

DeepMindAnthropicOpenAI

5. Decoding & Efficient Inference

6 exercises

#10

Implement Temperature Sampling

Divide logits by a temperature scalar before softmax to sharpen or flatten the token probability distribution.

Easy

OpenAIAnthropicCoherePerplexity

Implement Top-k Sampling

Select the k highest-probability tokens, zero out the rest, renormalize, and sample for controlled text generation.

Medium

AnthropicOpenAIDeepMindCohere

Implement Top-p (Nucleus) Sampling

Sort logits, compute cumulative probabilities, mask tokens below the nucleus threshold, and sample from the filtered distribution.

Medium

AnthropicOpenAIDeepMindPerplexityCohere

#18

Implement Speculative Decoding

Draft tokens with a fast model, verify in parallel with the target model, and accept/reject to guarantee identical output distribution.

Hard

GoogleDeepMindAnthropicApple

#19

Implement Continuous Batching for LLM Inference

Dynamically slot sequences in and out of a running batch as they finish, maximizing throughput without padding waste.

Hard

PerplexityTogether AIAnyscaleMeta

#28

Build a Complete LLM Inference Engine

Combine KV caching, continuous batching, and memory management into a production-grade inference server.

Expert

PerplexityTogether AIAnyscaleFireworks AI

6. Systems & Scaling

1 exercise

#17

Implement Mixture of Experts Layer

Build a gated MoE with top-k routing, load balancing loss, and expert capacity constraints for sparse computation.

Hard

GoogleDeepMindMistralDatabricksxAI

Follow the stages in order for the best “build an LLM from scratch” experience. Each exercise has a question notebook and a solution. Mark items complete to track progress.

Grind AttentionAttentionRLHFTransformersDiffusionTritonLoRAKV CacheDPO for ML/AI Interviews

Solved 0 of 75

Implement LLM from Scratch

Implement Byte Pair Encoding from Scratch

Implement Sinusoidal Embeddings

Implement ROPE Embeddings

Implement RMS Norm

Implement Attention from Scratch

Implement Multi-Head Attention from Scratch

Implement Grouped Query Attention from Scratch

Implement KV Cache for Autoregressive Generation

Implement Sliding Window Attention

Implement SmolLM from Scratch

Implement KL Divergence Loss

Implement LoRA on a Linear Layer

Apply SFT on SmolLM

Implement DPO Loss from Scratch

Implement PPO for RLHF

Implement GRPO (DeepSeek-R1 Algorithm)

Implement Temperature Sampling

Implement Top-k Sampling

Implement Top-p (Nucleus) Sampling

Implement Speculative Decoding

Implement Continuous Batching for LLM Inference

Build a Complete LLM Inference Engine

Implement Mixture of Experts Layer

Grind Attention for ML/AI Interviews