Ask HN: What's your favorite computer science podcasts?
==========
Ask HN: How do I convince a non-technical exercise to keep a journal
==========
Ask HN: Is it just me or not?
==========
Ask HN: What do I do with my MVP?
==========
Ask HN: How to sell?
==========
Ask HN: How do you use HackerNews?
==========
Ask HN: Best way to make a B2B startup?
==========
Ask HN: Why do I have to live in San Francisco?
==========
Ask HN: How to tell my heart changes?
==========
Ask HN: How to deal with the difference between a job interview and a product?
==========
Ask HN: What is your favorite open-source sytem?
==========
Ask HN: What are your favorite blogs and resources?
==========
Ask HN: What are the best books for learning a new language/frameworks?
==========
Ask HN: What's your favorite HN post?
==========
Ask HN: What is your favorite RSS reader
==========
Ask HN: Is the SE not a mistake like a safe space business?
==========
Ask HN: How do I start programming in a job?
I've been following minimaxir's work with GPT-2 for a while - I've tried building things on https://github.com/minimaxir/gpt-2-simple for example - and this looks like a HUGE leap forward in terms of developer usability. The old stuff was pretty good on that front, but this looks absolutely amazing. Really exciting project.
This is just brilliant. For someone who has little working knowledge but has massive interest in this field I found your guide exceptionally well written and newbie friendly (the way you've mentioned on how to setup this and that and left so many tips throughout is indeed very useful).
I'm going to have a lot of fun with this and this is going to be my starting point about learning more about colab notebooks and ai (always loved doing practical things instead of reading theory to learn something new).
Kudos to you for all this amazing work.
p.s. sorry if this is a lame question, but can this be used like how gmail recently has started to autocomplete my email sentences?
Awesome work! Whenever people tell me they want to get started with NLP I tell them to play around with your libraries as they're the easiest way to immediately start doing cool things.
> Generates text faster than gpt-2-simple and with better memory efficiency! (even from the 1.5B GPT-2 model!)
This is exciting news. One of very few drawbacks of gpt2-simple is the inability to fine-tune a model of more than ~355M parameters. Do these memory management improvements make it possible to fine-tune a larger one?
> Do these memory management improvements make it possible to fine-tune a larger one?
Unfortunately not yet; I need to implement gradient checkpointing first. Memory-wise, the results for finetuning 124M are promising (<8 GB VRAM when it used to take about 12 GB VRAM with gpt-2-simple)
If I want to fine-tune this to some text data, are there obvious constraints to be aware of? I've got a reasonable amount of text (~50-100G) but seeing that there's a json file created makes me think that's probably too much. gpt-2-simple seems to describe 100M as 'massive' so what's a reasonable amount to aim for?
Or should I be training from scratch? (edit - looking into training from scratch since I don't have thousands to throw at this I'm guessing that's a 'no')
Oh, that's less than I was expecting - I'm used to having significantly less data to play with than the major entities. I guess I do but in this case a pretty reasonable amount of data was enough for very impressive results.
> I'm not 100% sure you can encode and store that much data in memory with the current implementation, even with the fast tokenizers.
That makes sense. I wasn't too sure what sensible sizes would be, there's probably some interesting subsets of the data I could take though and use for fine tuning (or some sampled data) - maybe down to 100M as that sounded like a large-but-ok amount to use.
I'm looking forward to seeing what I can get out of this, thanks for making something simple enough that I can do that for a "I wonder if" kind of problem!
Model I/O: aitextgen abstracts some of the boilerplate and supports custom GPT-2 models and importing the old TensorFlow models better.
Training: Completely different from Transformers. Different file processing and encoding, training loop leverages pytorch-lightning.
Generation: Abstracts boilerplate, allowing addition of more utility functions (e.g bolding when printing to console, allow printing bulk text to file). Generation is admittingly not that much different than Transformers, but future iterations will increasingly diverge.
Does anyone know an efficient way to "embed" models like this? I'm currently working in a Tamagotchi-style RPI toy and I use GPT-2 to generate answers to the chat. I wrote a simple API that returns from the server. If I could embed my model, it would save me having to have a server.
The hard part of embedding is that the smallest 124M GPT-2 model itself is huge at 500MB, which would be unreasonable for performance/storage on the user end (and quantization/tracing can't save that much space).
Hence why I'm looking into smaller models, which has been difficult, but releasing aitextgen was a necessary first step.
The size of the model you need to get good enough generation with something like GPT-2 is going to be pretty impractical on a raspberry pi.
You might maybe be able to fit a 3-layer distilled GPT-2 in RAM (not quite sure what the latest RPI have in term of RAM, 4GB?), but the latency is going to be pretty horrible (multiple seconds).
why not put it on a server, and just use an api to communicate and get the results, then the embed of the code that interfaces w/ api should be much smaller, and the server can be as big as you need.
I had to do something similar (not this library, but I wish I had known about it) just last week. I'm building out a product demo and I wanted to fill it with books. I didn't want to go searching for out of print books, so I created fake authors, book titles, descriptions, and reviews. The longer text was sometimes great, and sometimes had to be redone but overall it worked really well.
I intend to productionize text generation, and this is a necessary intermediate step. (gpt-2-simple had too many issues in this area so I needed to start from scratch)
The next step is architecting an infrastructure for scalable generation; that depends on a few fixes for both aitextgen and the base Transformers. No ETA.
First install aitextgen:
Then you can download and generate from a custom Hacker News GPT-2 model I made (only 30MB compared to 500MB from the 124M GPT-2) using the CLI! Want to create Show HN titles? You can do that.