| | A Single 'Super Weight' Can Break Your Billion-Parameter Model (gonzoml.substack.com) |
|
2 points by che_shr_cat 17 days ago | past
|
| | Jax Things to Watch for in 2025 (gonzoml.substack.com) |
|
1 point by che_shr_cat 20 days ago | past
|
| | Diffusion models are evolutionary algorithms (gonzoml.substack.com) |
|
126 points by che_shr_cat 37 days ago | past | 27 comments
|
| | Make Softmax Great Again (gonzoml.substack.com) |
|
2 points by che_shr_cat 40 days ago | past
|
| | Deep Learning Frameworks: The Fourth Pillar of Deep Learning Revolution (gonzoml.substack.com) |
|
1 point by che_shr_cat 41 days ago | past
|
| | TextGrad: Automatic "Differentiation" via Text (gonzoml.substack.com) |
|
3 points by che_shr_cat 5 months ago | past
|
| | Superconducting Supercomputers (gonzoml.substack.com) |
|
1 point by che_shr_cat 5 months ago | past
|
| | Decoder-decoder architecture is coming (gonzoml.substack.com) |
|
2 points by che_shr_cat 6 months ago | past
|
| | Chronos: Using Pretrained LLMs for Probabilistic Time Series Forecasting (gonzoml.substack.com) |
|
2 points by che_shr_cat 7 months ago | past
|
| | Big Post About Big Context (gonzoml.substack.com) |
|
49 points by che_shr_cat 9 months ago | past | 19 comments
|
| | Neural Network Diffusion (gonzoml.substack.com) |
|
1 point by che_shr_cat 9 months ago | past
|
| | Thermodynamic AI is getting hotter (gonzoml.substack.com) |
|
51 points by che_shr_cat 10 months ago | past | 5 comments
|
| | Training LLMs with AMD GPUs on Frontier Supercomputer (gonzoml.substack.com) |
|
1 point by che_shr_cat 11 months ago | past
|
| | Beyond Chinchilla-Optimal Accounting for Inference in Language Model Scaling Law (gonzoml.substack.com) |
|
1 point by che_shr_cat 11 months ago | past
|
| | Project CETI (gonzoml.substack.com) |
|
2 points by che_shr_cat 12 months ago | past
|
| | GonzoML on Mamba and S6 (+previous post on S4) (gonzoml.substack.com) |
|
1 point by che_shr_cat on Dec 13, 2023 | past
|
| | Conway's Game of Life Is Omniperiodic (gonzoml.substack.com) |
|
2 points by che_shr_cat on Dec 9, 2023 | past | 1 comment
|
| | GonzoML on Gemini (gonzoml.substack.com) |
|
2 points by che_shr_cat on Dec 7, 2023 | past
|
| | Matryoshka Representation Learning (gonzoml.substack.com) |
|
2 points by che_shr_cat on Nov 3, 2023 | past
|
| | Mindstorms in Natural Language-Based Societies of Mind (gonzoml.substack.com) |
|
2 points by che_shr_cat on Oct 29, 2023 | past
|
| | The convolution empire strikes back (gonzoml.substack.com) |
|
132 points by che_shr_cat on Oct 27, 2023 | past | 56 comments
|
| | Sparse Universal Transformer (gonzoml.substack.com) |
|
3 points by che_shr_cat on Oct 23, 2023 | past
|
| | MemWalker: An alternative way for working with long documents using transformers (gonzoml.substack.com) |
|
1 point by che_shr_cat on Oct 17, 2023 | past
|
| | "Building Machines That Learn and Think Like People", 7 Years Later (gonzoml.substack.com) |
|
106 points by che_shr_cat on Oct 13, 2023 | past | 40 comments
|
| | Chain-of-Thought → Tree-of-Thought (gonzoml.substack.com) |
|
1 point by che_shr_cat on Oct 10, 2023 | past
|
| | Mortal Computers (gonzoml.substack.com) |
|
31 points by che_shr_cat on Oct 9, 2023 | past | 1 comment
|