Claude teaches you PyTorch
Turn any AI assistant into your PyTorch interview coach. 75 problems, progressive hints, company prep, and learning paths. It won't spoil the answers either.
How It Works
Clone the Repo
You'll work through problems as Jupyter notebooks. Clone first so the AI tutor can read the actual problem files.
git clone https://github.com/Exorust/TorchLeet.git cd TorchLeet
Connect the AI Tutor
Run this from inside the TorchLeet folder. npx handles the rest.
- Run this single command:
claude mcp add torchleet -- npx -y torchleet-mcp
Learning Guides
Example Workflows
I want to learn how attention works in transformers
Great choice! Let's start with the fundamentals. Before we dive into the code, tell me: what do you think "attention" means intuitively? What problem does it solve that simpler approaches can't?
It lets the model focus on relevant parts of the input?
Exactly! Now let's make that concrete. I've pulled up the Scaled Dot-Product Attention problem (#V3-8). Here's your Level 1 hint, just the problem statement and requirements. Try to sketch out the function signature before I show you more.
I'm interviewing at Anthropic next week, help me prepare
Let's simulate a real interview. I've found 8 problems tagged with Anthropic, spanning DPO Loss, GRPO, KV Cache, and more. We'll start with a 20-minute timed problem. Ready?
Yes, let's go!
Your first problem: Implement DPO Loss from scratch. You need to compute the Direct Preference Optimization loss given policy and reference model log-probabilities. Timer starts now. What's your approach?
Can you review my FlashAttention implementation?
I'll review this as a senior ML engineer. Three things I notice right away:
1. **Numerical stability**: your softmax isn't using the log-sum-exp trick
2. **Memory**: you're materializing the full N×N attention matrix, which defeats the purpose of FlashAttention
3. **Tiling**: the block size should be tuned to SRAM capacity
Let's fix #2 first since it's the core idea...