| | Writing an LLM from scratch, part 32g – Interventions: weight tying (gilesthomas.com) |
| 1 point by gpjt 2 minutes ago | past | discuss |
|
| | Writing an LLM from scratch, part 32f – Interventions: weight decay (gilesthomas.com) |
| 6 points by gpjt 19 hours ago | past | discuss |
|
| | Writing an LLM from scratch, part 32e – Interventions: the learning rate (gilesthomas.com) |
| 3 points by ibobev 8 days ago | past | discuss |
|
| | Writing an LLM from scratch, part 32e – Interventions: the learning rate (gilesthomas.com) |
| 3 points by gpjt 14 days ago | past |
|
| | Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com) |
| 3 points by ibobev 43 days ago | past |
|
| | Writing an LLM from scratch, part 32B – Interventions: gradient clipping (gilesthomas.com) |
| 1 point by ibobev 43 days ago | past |
|
| | Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com) |
| 1 point by ibobev 43 days ago | past |
|
| | Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com) |
| 1 point by ibobev 43 days ago | past |
|
| | Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com) |
| 6 points by gpjt 45 days ago | past |
|
| | Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com) |
| 1 point by gpjt 46 days ago | past |
|
| | Writing an LLM from scratch, part 32B – Interventions: gradient clipping (gilesthomas.com) |
| 2 points by gpjt 47 days ago | past |
|
| | Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com) |
| 1 point by gpjt 48 days ago | past |
|
| | Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com) |
| 1 point by ibobev 54 days ago | past |
|
| | Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com) |
| 1 point by gpjt 54 days ago | past |
|
| | Writing an LLM from scratch, part 31 – the models are now on Hugging Face (gilesthomas.com) |
| 1 point by ibobev 64 days ago | past |
|
| | Writing an LLM from scratch, part 31 – the models are now on Hugging Face (gilesthomas.com) |
| 2 points by gpjt 65 days ago | past |
|
| | Digging into the LLM-as-a-Judge Results (gilesthomas.com) |
| 1 point by ibobev 73 days ago | past |
|
| | Digging into the LLM-as-a-Judge Results (gilesthomas.com) |
| 1 point by ibobev 74 days ago | past |
|
| | Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results (gilesthomas.com) |
| 1 point by gpjt 74 days ago | past |
|
| | Using DistributedDataParallel to train a base model from scratch in the cloud (gilesthomas.com) |
| 10 points by ibobev 75 days ago | past |
|
| | LLM from scratch, part 29 – using DDP to train a base model in the cloud (gilesthomas.com) |
| 2 points by gpjt 75 days ago | past |
|
| | LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com) |
| 540 points by gpjt 3 months ago | past | 121 comments |
|
| | Why smart instruction-following makes prompt injection easier (gilesthomas.com) |
| 2 points by ibobev 4 months ago | past |
|
| | Writing an LLM from scratch, part 27 – what's left, and what's next? (gilesthomas.com) |
| 1 point by gpjt 4 months ago | past |
|
| | Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (gilesthomas.com) |
| 4 points by gpjt 4 months ago | past |
|
| | Writing an LLM from scratch, part 25 – instruction fine-tuning (gilesthomas.com) |
| 2 points by gpjt 4 months ago | past |
|
| | Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com) |
| 1 point by gpjt 4 months ago | past |
|
| | Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com) |
| 1 point by ibobev 4 months ago | past |
|
| | Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com) |
| 1 point by ibobev 4 months ago | past |
|
| | Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com) |
| 3 points by gpjt 5 months ago | past |
|
|
| More |