siweizzz's comments

siweizzz · on Oct 28, 2020

Check out Turn Base (implement and play board games using a DSL): https://turn-base.com/

Here's an original game implemented using this system: https://turn-base.com/games/lobby/22/ (Buy cards to summon and move chess pieces in a chess game) You'll need to grab a friend to play (no AI). Ping me in discord if the game doesn't make sense.

siweizzz · on June 21, 2020

I made a website for people to play and prototype/playtest board games digitally

  - online, multiplayer gameplay
  - rules are enforced (using a DSL)
  - customize the look and feel using WYSIWYG editors

Proof of concept implemented using the system: https://turn-base.com/games/lobby/22/

siweizzz · on May 18, 2020

I made a website for people to prototype and playtest/play board games

Quick summary:

  - online, multiplayer gameplay
  - rules are enforced (using a DSL)
  - customize the look and feel using WYSIWYG editors
  - modular: mix and match tokensets and boards across different games
  - bonus: videochat while you play

Deck Chess is a proof of concept for the system: https://turn-base.com/games/lobby/22/

siweizzz · on Dec 5, 2019

Can you share how large your dataset (how many tokens) and batch size was and how many epochs you used? By training slowly, do you mean that you used a small learning rate? If so, what was it?

I've been reading up on batch size and people are all over the place. Some say smaller is better and some say larger is better. Mostly when it comes to gpt2 people say larger is better but there must come a point when increasing the batch size is no longer beneficial (or is it just that you use as large as your memory will allow)?

sillysaurusx · on Dec 5, 2019

In fact, it’s an open question whether larger batch sizes are better. https://twitter.com/jeremyphoward/status/1189643170377658369...

Seconding all of your questions! Details about successful 1.5B training is really hard to come by.

In case it’s helpful, here are some details of how a Chinese 1.5b GPT-2 was trained: https://github.com/imcaspar/gpt2-ml

It looks like they used a batch size of 2 on a TPUv3-256 pod. It took 50 hours and 99,000 training steps, which seems like about 1.3 examples per second.

siweizzz · on Dec 5, 2019

Agreed there doesn’t seem to be a consensus. Thanks for the links

nickwalton00 · on Dec 5, 2019

Had to go check my training file to remember.

Datasize: Around 30 MB, so around ~8000000 token? Can't remember exactly Learning Rate: was 1e-4, so I guess not that slow. I trained for around 1000 steps, but ended up liking the model from step 550. Which I think ended up at around 2 full passes through my data.

There probably is a point where increasing batch size is no longer helpful, my batch size was 32. When I had it lower I had issues with memorization/bias towards particular parts of the training data that it had most recently trained on.

siweizzz · on Dec 5, 2019

Thanks, good to have this data point. I’ve been training a roughly similarly sized dataset for many 10s of ks of steps (but on 355m). Wondering if I need so many steps.

gwern · on Dec 6, 2019

Only 30MB? If it's based on text adventures, can't you get way more data than that?

nickwalton00 · on Dec 6, 2019

I scraped a bunch of stories from chooseyourstory.com but I did curate them to make sure they had the right second person format. I couldn't really anywhere else that had a consistent format that would make scraping easy enough.