Show HN: A Python tool for text-based AI training and generation using GPT-2

minimaxir · on May 18, 2020

For fun, here's a little demo of aitextgen that you can run on your own computer.

First install aitextgen:

    pip3 install aitextgen

Then you can download and generate from a custom Hacker News GPT-2 model I made (only 30MB compared to 500MB from the 124M GPT-2) using the CLI!

    aitextgen generate --model minimaxir/hacker-news --n 20 --to_file False

Want to create Show HN titles? You can do that.

    aitextgen generate --model minimaxir/hacker-news --n 20 --to_file False --prompt "Show HN:"

ideashower · on May 18, 2020

Show HN: Numericcal – A simple, distributed, and fast backups

Show HN: A simple, free and open source alternative to Turkish potatoies

Show HN: A boilerplate for mobile development

Show HN: Simple UI Gao-Parser (for the Web)

Show HN: A fast, fully-featured web application framework

Show HN: I have a side project you want to sell in a startup?

Show HN: S3CARP Is Down

Show HN: Finding the right work with friends and family

Show HN: I built a webapp to remind users to view your photoshopped stripes

Show HN: Send a hands-only gift reason to the Mark Zuckerberg & Stay a lot.

Show HN: A simple, high-performance, full-disk encryption

Show HN: Peer-to-peer programming language

Show HN: Browse and duplicate images in your app's phone

Show HN: Waze – Send a face back end to the internet

Show HN: A simple, minimal, real-time building app to control your Mac.

Show HN: Sheldonize – A collaborative group for startups

Show HN: Gumroad – Make your web app faster

Show HN: An easy way to track time using MD5?

Show HN: A simple, fast, and elegant ORM/Lambda: progressive web apps for Vim

Show HN: A simple landing page I've been working on elsdst Certy. Here is how I was within the last year

contravariant · on May 18, 2020

>progressive web apps for Vim

Well it knows how to get HN users attention all right.

ttul · on May 18, 2020

Show HN: An easy way to track time using MD5?

I need to see this in action.

IanCal · on May 20, 2020

Hash the current time in seconds, every time the first three hex digits are 000 then on average another hour and 8 minutes has passed.

It's easy, awkward, time consuming and probably pretty wrong for tracking hours. Just like regular time tracking!

totetsu · on May 19, 2020

I get a source.error("unbalanced parenthesis") if I put unbalanced parentheses in the --prompt "argument)"

minimaxir · on May 19, 2020

Likely same issue as here: https://github.com/minimaxir/aitextgen/issues/8

Will take a look.

totetsu · on May 19, 2020

how does one download these models?

totetsu · on May 19, 2020

  Ask HN: What's your favorite computer science podcasts?
  ==========
  Ask HN: How do I convince a non-technical exercise to keep a journal
  ==========
  Ask HN: Is it just me or not?
  ==========
  Ask HN: What do I do with my MVP?
  ==========
  Ask HN: How to sell?
  ==========
  Ask HN: How do you use HackerNews?
  ==========
  Ask HN: Best way to make a B2B startup?
  ==========
  Ask HN: Why do I have to live in San Francisco?
  ==========
  Ask HN: How to tell my heart changes?
  ==========
  Ask HN: How to deal with the difference between a job interview and a product?
  ==========
  Ask HN: What is your favorite open-source sytem?
  ==========
  Ask HN: What are your favorite blogs and resources?
  ==========
  Ask HN: What are the best books for learning a new language/frameworks?
  ==========
  Ask HN: What's your favorite HN post?
  ==========
  Ask HN: What is your favorite RSS reader
  ==========
  Ask HN: Is the SE not a mistake like a safe space business?
  ==========
  Ask HN: How do I start programming in a job?

minimaxir · on May 19, 2020

Seems you figured it out, but for posterity, it will automatically download the models if not cached.

totetsu · on May 19, 2020

first you have to run python

  from aitextgen import aitextgen
  ai = aitextgen(model="minimaxir/hacker-news")

then use the cli.. I guess.

simonw · on May 18, 2020

I've been following minimaxir's work with GPT-2 for a while - I've tried building things on https://github.com/minimaxir/gpt-2-simple for example - and this looks like a HUGE leap forward in terms of developer usability. The old stuff was pretty good on that front, but this looks absolutely amazing. Really exciting project.

superasn · on May 20, 2020

This is just brilliant. For someone who has little working knowledge but has massive interest in this field I found your guide exceptionally well written and newbie friendly (the way you've mentioned on how to setup this and that and left so many tips throughout is indeed very useful).

I'm going to have a lot of fun with this and this is going to be my starting point about learning more about colab notebooks and ai (always loved doing practical things instead of reading theory to learn something new).

Kudos to you for all this amazing work.

p.s. sorry if this is a lame question, but can this be used like how gmail recently has started to autocomplete my email sentences?

starskublue · on May 18, 2020

Awesome work! Whenever people tell me they want to get started with NLP I tell them to play around with your libraries as they're the easiest way to immediately start doing cool things.

neoncontrails · on May 18, 2020

Huge fan of your gpt2-simple library, which I used to train a satirical news generator in a Colab notebook: https://colab.research.google.com/drive/1buF7Tju3DkZeL-EV4Ft...

> Generates text faster than gpt-2-simple and with better memory efficiency! (even from the 1.5B GPT-2 model!)

This is exciting news. One of very few drawbacks of gpt2-simple is the inability to fine-tune a model of more than ~355M parameters. Do these memory management improvements make it possible to fine-tune a larger one?

minimaxir · on May 18, 2020

> Do these memory management improvements make it possible to fine-tune a larger one?

Unfortunately not yet; I need to implement gradient checkpointing first. Memory-wise, the results for finetuning 124M are promising (<8 GB VRAM when it used to take about 12 GB VRAM with gpt-2-simple)

menmob · on May 19, 2020

Have I been using gpt-2-simple wrong..? I’ve been fine-tuning 355M on a 8GB 1080 for months..

minimaxir · on May 19, 2020

gpt-2-simple has gradient checkpointing; aitextgen does not (yet).

IanCal · on May 19, 2020

This looks great!

If I want to fine-tune this to some text data, are there obvious constraints to be aware of? I've got a reasonable amount of text (~50-100G) but seeing that there's a json file created makes me think that's probably too much. gpt-2-simple seems to describe 100M as 'massive' so what's a reasonable amount to aim for?

Or should I be training from scratch? (edit - looking into training from scratch since I don't have thousands to throw at this I'm guessing that's a 'no')

minimaxir · on May 19, 2020

~50-100G isn't "some" text data. The original GPT-2 was trained on 40G of text data.

I'm not 100% sure you can encode and store that much data in memory with the current implementation, even with the fast tokenizers.

IanCal · on May 20, 2020

Oh, that's less than I was expecting - I'm used to having significantly less data to play with than the major entities. I guess I do but in this case a pretty reasonable amount of data was enough for very impressive results.

> I'm not 100% sure you can encode and store that much data in memory with the current implementation, even with the fast tokenizers.

That makes sense. I wasn't too sure what sensible sizes would be, there's probably some interesting subsets of the data I could take though and use for fine tuning (or some sampled data) - maybe down to 100M as that sounded like a large-but-ok amount to use.

I'm looking forward to seeing what I can get out of this, thanks for making something simple enough that I can do that for a "I wonder if" kind of problem!

alphagrep12345 · on May 18, 2020

Your API looks really clean but what's the difference between this and just GPT-2 (or) HuggingFace's implementations?

minimaxir · on May 18, 2020

I talk about deviations from previous approaches in the DESIGN doc (https://github.com/minimaxir/aitextgen/blob/master/DESIGN.md), but to answer the difference between aitextgen and Huggingface Transformers:

Model I/O: aitextgen abstracts some of the boilerplate and supports custom GPT-2 models and importing the old TensorFlow models better.

Training: Completely different from Transformers. Different file processing and encoding, training loop leverages pytorch-lightning.

Generation: Abstracts boilerplate, allowing addition of more utility functions (e.g bolding when printing to console, allow printing bulk text to file). Generation is admittingly not that much different than Transformers, but future iterations will increasingly diverge.

jakearmitage · on May 18, 2020

Does anyone know an efficient way to "embed" models like this? I'm currently working in a Tamagotchi-style RPI toy and I use GPT-2 to generate answers to the chat. I wrote a simple API that returns from the server. If I could embed my model, it would save me having to have a server.

minimaxir · on May 18, 2020

The hard part of embedding is that the smallest 124M GPT-2 model itself is huge at 500MB, which would be unreasonable for performance/storage on the user end (and quantization/tracing can't save that much space).

Hence why I'm looking into smaller models, which has been difficult, but releasing aitextgen was a necessary first step.

Voloskaya · on May 18, 2020

The size of the model you need to get good enough generation with something like GPT-2 is going to be pretty impractical on a raspberry pi. You might maybe be able to fit a 3-layer distilled GPT-2 in RAM (not quite sure what the latest RPI have in term of RAM, 4GB?), but the latency is going to be pretty horrible (multiple seconds).

gremlinsinc · on May 19, 2020

why not put it on a server, and just use an api to communicate and get the results, then the embed of the code that interfaces w/ api should be much smaller, and the server can be as big as you need.

alphagrep12345 · on May 18, 2020

What do you mean by embed the model?

brendanfalk · on May 18, 2020

Don't understand why this didn't get more hype. This is amazing. Well done

minimaxir · on May 18, 2020

AI text generation in general is an industry that's been underhyped. Which is why I'm trying to help shape it. :)

harshalaxman · on May 18, 2020

Very cool. Can I ask what your use case is, or if it's just for fun?

dustincoates · on May 18, 2020

I had to do something similar (not this library, but I wish I had known about it) just last week. I'm building out a product demo and I wanted to fill it with books. I didn't want to go searching for out of print books, so I created fake authors, book titles, descriptions, and reviews. The longer text was sometimes great, and sometimes had to be redone but overall it worked really well.

minimaxir · on May 18, 2020

I intend to productionize text generation, and this is a necessary intermediate step. (gpt-2-simple had too many issues in this area so I needed to start from scratch)

jramz · on May 18, 2020

That is cool, do you have a timeline set out for this?

minimaxir · on May 18, 2020

I'll likely start by creating a web API service similar to what I did for gpt-2-simple, except more efficient: https://github.com/minimaxir/gpt-2-cloud-run

The next step is architecting an infrastructure for scalable generation; that depends on a few fixes for both aitextgen and the base Transformers. No ETA.

ttul · on May 18, 2020

I’m planning to ingest all my historical email into the model and then I’ll have a generator that writes for me using my voice...

abiogenesis · on May 19, 2020

Reminded me of this: https://en.wikipedia.org/wiki/Be_Right_Back

cedyf · on May 18, 2020

Nice trying this out

menmob · on May 19, 2020

Great work, been loving gpt-2-simple recently!

master_yoda_1 · on May 18, 2020

One question: Why put more sh!!t on top of huggingface already really good code?

IRegretNothing · on May 19, 2020

Very interesting results

>>> ai.generate(1, prompt="Trump") Trump] The best way to start your life is to have sex with someone who is still a virus

As funny as it seems, it shows what things are being associated with