████████╗ ██████╗ ██████╗  ██████╗██╗  ██╗██╗     ███████╗███████╗████████╗
╚══██╔══╝██╔═══██╗██╔══██╗██╔════╝██║  ██║██║     ██╔════╝██╔════╝╚══██╔══╝
   ██║   ██║   ██║██████╔╝██║     ███████║██║     █████╗  █████╗     ██║
   ██║   ██║   ██║██╔══██╗██║     ██╔══██║██║     ██╔══╝  ██╔══╝     ██║
   ██║   ╚██████╔╝██║  ██║╚██████╗██║  ██║███████╗███████╗███████╗   ██║
   ╚═╝    ╚═════╝ ╚═╝  ╚═╝ ╚═════╝╚═╝  ╚═╝╚══════╝╚══════╝╚══════╝   ╚═╝
Welcome to TorchLeet. The best collection of PyTorch practice problems for ML/AI interviews based off real engineer interviews. 90 questions across 3 sets tagged with companies who ask them. (rumor has it there's more hidden in v3 if you know where to look) Type help to get started or exit for webview.
torchleet:~$ 

Grind PyTorch for ML/AI Interviews

Practice problems from basic PyTorch to advanced LLM systems. Tagged with real interview companies. TorchLeet V3 is now out with 30 new questions and company-wise filtering!

35
PyTorch Questions
25
LLM Questions
30
Advanced ML Questions
#1
v3

Implement Softmax from Scratch

Build numerically stable softmax using the log-sum-exp trick, handling overflow and underflow in raw tensor math.

Easy
AppleMetaGoogleAmazon
#2
v3

Implement K-Means Clustering in PyTorch

Code Lloyd's algorithm with PyTorch tensors: random init, distance computation, centroid updates, and convergence check.

Easy
UberLinkedInGoogleAmazon
#3
v3

Implement KNN in PyTorch

Build k-nearest neighbors classification using vectorized distance computation and top-k selection on tensors.

Easy
UberLinkedInMeta
#4
v3

Implement Logistic Regression with Gradient Descent

Build binary logistic regression from scratch with sigmoid, binary cross-entropy, and manual gradient descent updates.

Easy
GoogleMetaAmazon
#5
v3

Implement Contrastive Loss (InfoNCE) + CLIP Training Loop

Build the InfoNCE contrastive loss and a CLIP-style training loop that aligns image and text embeddings.

Medium
OpenAIAnthropicDeepMindMidjourneyApple
#6
v3

Implement 2D Positional Embeddings

Build 2D sinusoidal position encodings for vision transformers, encoding both row and column positions in patch grids.

Medium
AnthropicDeepMindMidjourneyRunway
#7
v3

Implement Top-p (Nucleus) Sampling

Sort logits, compute cumulative probabilities, mask tokens below the nucleus threshold, and sample from the filtered distribution.

Medium
AnthropicOpenAIDeepMindPerplexityCohere
#8
v3

Implement Top-k Sampling

Select the k highest-probability tokens, zero out the rest, renormalize, and sample for controlled text generation.

Medium
AnthropicOpenAIDeepMindCohere
#9
v3

Implement Beam Search for LLM Decoding

Maintain and expand top-scoring partial sequences at each decoding step with length normalization and early stopping.

Medium
GoogleDeepMindMetaApple
#10
v3

Implement Temperature Sampling

Divide logits by a temperature scalar before softmax to sharpen or flatten the token probability distribution.

Easy
OpenAIAnthropicCoherePerplexity
#11
v3

Implement LoRA on a Linear Layer

Inject trainable low-rank decomposition matrices (A, B) into a frozen linear layer for parameter-efficient fine-tuning.

Medium
MetaGoogleAnthropicOpenAIDatabricks
#12
v3

Implement KV Cache for Autoregressive Generation

Cache key and value tensors from previous timesteps so each new token only computes attention over one new position.

Medium
AnthropicOpenAIMetaPerplexityTogether AI
#13
v3

Implement Sliding Window Attention

Restrict attention to a fixed local window around each token, reducing memory from O(n²) to O(n·w) for long sequences.

Medium
MistralAnthropicGoogleDeepMind
#14
v3

Implement DPO Loss from Scratch

Compute the Direct Preference Optimization loss that trains a policy directly from preference pairs without a reward model.

Hard
AnthropicOpenAIDeepMindMeta
#15
v3

Implement PPO for RLHF

Build Proximal Policy Optimization with clipped surrogate objective, value function baseline, and KL penalty for RLHF.

Hard
AnthropicOpenAIDeepMindMeta
#16
v3

Implement Gradient Checkpointing

Trade compute for memory by recomputing intermediate activations during backward instead of storing them all.

Hard
MetaGoogleNVIDIATesla
#17
v3

Implement Mixture of Experts Layer

Build a gated MoE with top-k routing, load balancing loss, and expert capacity constraints for sparse computation.

Hard
GoogleDeepMindMistralDatabricksxAI
#18
v3

Implement Speculative Decoding

Draft tokens with a fast model, verify in parallel with the target model, and accept/reject to guarantee identical output distribution.

Hard
GoogleDeepMindAnthropicApple
#19
v3

Implement Continuous Batching for LLM Inference

Dynamically slot sequences in and out of a running batch as they finish, maximizing throughput without padding waste.

Hard
PerplexityTogether AIAnyscaleMeta
#20
v3

Implement DDPM from Scratch

Build the full denoising diffusion pipeline: forward noise schedule, U-Net denoiser, and reverse sampling to generate images.

Hard
MidjourneyRunwayStability AIAdobeGoogle
#21
v3

Implement DDIM Sampling + Classifier-Free Guidance

Build deterministic DDIM sampling with classifier-free guidance to control image generation quality and conditioning adherence.

Hard
MidjourneyRunwayStability AIAdobe
#22
v3

Implement Selective State Space Model (Mamba)

Build Mamba's selective scan mechanism with input-dependent parameters, achieving linear-time sequence modeling without attention.

Hard
DeepMindGoogleAnthropic
#23
v3

Implement Vision Transformer + MAE Pretraining

Build ViT with masked autoencoder pretraining — randomly mask patches, encode visible ones, decode to reconstruct.

Hard
MetaGoogleAppleTeslaWaymo
#24
v3

Write a Fused Softmax Kernel in Triton

Write a GPU kernel in Triton that fuses the softmax computation into a single pass, eliminating intermediate memory reads.

Expert
NVIDIAMetaGooglexAITesla
#25
v3

Implement FlashAttention-2 in Triton

Write the tiled, fused attention kernel that computes exact attention in O(n) memory using online softmax in Triton.

Expert
NVIDIAMetaTogether AIxAI
#26
v3

Implement FSDP from Scratch

Build Fully Sharded Data Parallel: shard parameters across GPUs, all-gather before forward, reduce-scatter gradients after backward.

Expert
MetaGoogleNVIDIAAnthropicxAI
#27
v3

Implement GRPO (DeepSeek-R1 Algorithm)

Build Group Relative Policy Optimization that scores multiple completions per prompt and uses group-relative advantages.

Expert
DeepMindAnthropicOpenAI
#28
v3

Build a Complete LLM Inference Engine

Combine KV caching, continuous batching, and memory management into a production-grade inference server.

Expert
PerplexityTogether AIAnyscaleFireworks AI
#29
v3

Implement Knowledge Distillation

Train a smaller student model to match a larger teacher's soft predictions using temperature-scaled KL divergence loss.

Medium
GoogleAppleMetaQualcommTesla
#30
v3

Implement Ring Attention for Long Contexts

Distribute attention computation across GPUs in a ring topology, enabling context lengths that exceed single-GPU memory.

Expert
AnthropicGoogleMetaxAI