Grind PyTorch for ML/AI Interviews
Practice problems from basic PyTorch to advanced LLM systems. Tagged with real interview companies. TorchLeet V3 is now out with 30 new questions and company-wise filtering!
Implement Softmax from Scratch
Build numerically stable softmax using the log-sum-exp trick, handling overflow and underflow in raw tensor math.
Implement K-Means Clustering in PyTorch
Code Lloyd's algorithm with PyTorch tensors: random init, distance computation, centroid updates, and convergence check.
Implement KNN in PyTorch
Build k-nearest neighbors classification using vectorized distance computation and top-k selection on tensors.
Implement Logistic Regression with Gradient Descent
Build binary logistic regression from scratch with sigmoid, binary cross-entropy, and manual gradient descent updates.
Implement Contrastive Loss (InfoNCE) + CLIP Training Loop
Build the InfoNCE contrastive loss and a CLIP-style training loop that aligns image and text embeddings.
Implement 2D Positional Embeddings
Build 2D sinusoidal position encodings for vision transformers, encoding both row and column positions in patch grids.
Implement Top-p (Nucleus) Sampling
Sort logits, compute cumulative probabilities, mask tokens below the nucleus threshold, and sample from the filtered distribution.
Implement Top-k Sampling
Select the k highest-probability tokens, zero out the rest, renormalize, and sample for controlled text generation.
Implement Beam Search for LLM Decoding
Maintain and expand top-scoring partial sequences at each decoding step with length normalization and early stopping.
Implement Temperature Sampling
Divide logits by a temperature scalar before softmax to sharpen or flatten the token probability distribution.
Implement LoRA on a Linear Layer
Inject trainable low-rank decomposition matrices (A, B) into a frozen linear layer for parameter-efficient fine-tuning.
Implement KV Cache for Autoregressive Generation
Cache key and value tensors from previous timesteps so each new token only computes attention over one new position.
Implement Sliding Window Attention
Restrict attention to a fixed local window around each token, reducing memory from O(n²) to O(n·w) for long sequences.
Implement DPO Loss from Scratch
Compute the Direct Preference Optimization loss that trains a policy directly from preference pairs without a reward model.
Implement PPO for RLHF
Build Proximal Policy Optimization with clipped surrogate objective, value function baseline, and KL penalty for RLHF.
Implement Gradient Checkpointing
Trade compute for memory by recomputing intermediate activations during backward instead of storing them all.
Implement Mixture of Experts Layer
Build a gated MoE with top-k routing, load balancing loss, and expert capacity constraints for sparse computation.
Implement Speculative Decoding
Draft tokens with a fast model, verify in parallel with the target model, and accept/reject to guarantee identical output distribution.
Implement Continuous Batching for LLM Inference
Dynamically slot sequences in and out of a running batch as they finish, maximizing throughput without padding waste.
Implement DDPM from Scratch
Build the full denoising diffusion pipeline: forward noise schedule, U-Net denoiser, and reverse sampling to generate images.
Implement DDIM Sampling + Classifier-Free Guidance
Build deterministic DDIM sampling with classifier-free guidance to control image generation quality and conditioning adherence.
Implement Selective State Space Model (Mamba)
Build Mamba's selective scan mechanism with input-dependent parameters, achieving linear-time sequence modeling without attention.
Implement Vision Transformer + MAE Pretraining
Build ViT with masked autoencoder pretraining — randomly mask patches, encode visible ones, decode to reconstruct.
Write a Fused Softmax Kernel in Triton
Write a GPU kernel in Triton that fuses the softmax computation into a single pass, eliminating intermediate memory reads.
Implement FlashAttention-2 in Triton
Write the tiled, fused attention kernel that computes exact attention in O(n) memory using online softmax in Triton.
Implement FSDP from Scratch
Build Fully Sharded Data Parallel: shard parameters across GPUs, all-gather before forward, reduce-scatter gradients after backward.
Implement GRPO (DeepSeek-R1 Algorithm)
Build Group Relative Policy Optimization that scores multiple completions per prompt and uses group-relative advantages.
Build a Complete LLM Inference Engine
Combine KV caching, continuous batching, and memory management into a production-grade inference server.
Implement Knowledge Distillation
Train a smaller student model to match a larger teacher's soft predictions using temperature-scaled KL divergence loss.
Implement Ring Attention for Long Contexts
Distribute attention computation across GPUs in a ring topology, enabling context lengths that exceed single-GPU memory.