Spinning Up in Deep RL

GnarlyWhale · on Aug 17, 2020

Plug for the RL specialization out of the University of Alberta, hosted on coursera: https://www.coursera.org/specializations/reinforcement-learn... All courses in the specialization are free to audit.

For those unaware, the university of Alberta is Rich Sutton's home institution, and he approves of and promotes the course.

infimum · on Aug 17, 2020

Currently on course 2/4 in the series and it's great. Every week starts with a reading assignment from the RL book followed by a series of videos (re-)explaining stuff. The videos themselves are very nicely structured, with clear outlook and summary at the start and end of them. Sutton himself appears in a couple of videos and there are other great guest lectures with interesting insights about applications.

Definitely a recommendation!

svalorzen · on Aug 17, 2020

If you are ever interested in the topic of RL, but wish to start learning the concepts on simpler algorithms and keep the "deep" part for later, I maintain a library that has most of the same design goals:

https://github.com/Svalorzen/AI-Toolbox

Each algorithm is extensively commented, self-contained (aside from general utilities), and the interfaces are as similar as I could make them be. One of my goals is specifically to help people try out simple algorithms so they can inspect and understand what is happening, before trying out more powerful but less transparent algorithms.

I'd be happy to receive feedback on accessibility, presentation, docs or even more algorithms that you'd like to see implemented (or even general questions on how things work).

plants · on Aug 17, 2020

Asking for the benefit of me and others since this is on the front page now - are there any resources this comprehensive for any other field of study? This guide is amazing and I've failed to find anything else like it. I was specifically interested in biotech (from the perspective of a software developer, i.e. practically zero biology background), but will take what I can get

unoti · on Aug 17, 2020

> Are there any resources this comprehensive for any other field of study? ... I was specifically interested in biotech

I recommend the FastAI course on deep learning. Several of their lectures relate to things their students have done in biotech and medical. The main lecturer Jeremy Howard has worked for years at the crossroads of medicinal technology and AI, and routinely discusses this.

The full fastai course is here[1] and free. Here is a blog post and associated video[2] as an example of fastai incorporating biotech into their work. In this example they use AI to upsample the resolution and quality of microscopes.

[1] https://www.fast.ai/

[2] https://www.fast.ai/2019/05/03/decrappify/

kakadzhun · on Aug 17, 2020

If you want to play around with Spinning Up in a Docker container, then make sure you git clone the repository, then pip install -e repository. For whatever reason, if you try to directly install it with pip, it doesn't work, at least last time I tried it. Here's a Dockerfile and docker-compose.yaml I created some time ago: https://github.com/joosephook/spinningup-dockerfile

mementomori · on Aug 17, 2020

Can anyone recommend some less opinionated introductory resources to learn reinforcement learning that focus on first principles?

blahblahblah10 · on Aug 17, 2020

I would highly recommend Sergey Levine's course:

http://rail.eecs.berkeley.edu/deeprlcourse/

For a more mathematical treatment, there's a beautiful book by Puterman:

https://www.amazon.com/Markov-Decision-Processes-Stochastic-...

ampdepolymerase · on Aug 17, 2020

I recommend Sutton and Barto

http://www.incompleteideas.net/book/the-book-2nd.html

flooo · on Aug 18, 2020

RL, including contextual bandits, is becoming more popular for personalization, i.e. adapting some system to the preferences of (groups of) individuals.

Plug/Source: I did a lit. review on this topic https://doi.org/10.3233/DS-200028

janhenr · on Aug 17, 2020

I enormously appreciate the resources OpenAI provides to start out in DRL such as this one. However, OpenAI has (purposely?) left out the brittleness of their algorithms to parameter choice and code-level optimizations [1] in the past. As a researcher myself, I would be more than surprised to hear that OpenAI did not explore this behaviour themselves. Instead, my guess would be that these "inconveniences" would do harm to the Marketing of OpenAI and its algos. Such deeds are far more harmful to proper understanding of DRL and applications than a nice UI is beneficial imo.

[1]https://gradientscience.org/policy_gradients_pt1/

cbHXBY1D · on Aug 17, 2020

There was a discussion on r/datascience this weekend about if anyone uses RL. Almost no one does.

https://www.reddit.com/r/datascience/comments/iav3lv/how_oft...

p1esk · on Aug 17, 2020

Why would you ask data scientists about RL? I bet no one there uses deep learning either. That does not mean much.

_5659 · on Aug 17, 2020

"Pray, who is the candidate's tailor?" -Hilbert

Who is responsible for OpenAI's UI/UX design? It is immaculate and should be the standard for the community. I'm always dazzled by the impeccable standards of OpenAI with regards to tone, presentation, accessibility.

The documentation is both familiar but distinct, an impressive achievement!

I have my own personal qualms on OpenAI's ethics and virtues but am nevertheless impressed by their aesthetics and regard for their publicity. It's always delightful to look at their work.

OpenAI has in my opinion, the most appropriate presentation for their ideas with marketing and branding. It feels exquisitely simple to grasp what goes on here.

I feel comfortable saying that the biggest obstacle in progress for AI is UI but projects such as this give me hope.

colordrops · on Aug 17, 2020

I assume this comment is generated? The link is a standard Sphinx doc.

Davidzheng · on Aug 17, 2020

If that comment is generated, I will quit my current job and work full time on AI. I don't believe it.

317070 · on Aug 17, 2020

All the blogs posted by e.g. this user [0] were generated by GPT-3. [1] Some of those reached the top of HN.

That comment indeed looks a lot like it is generated. It has correlated a bunch of words, but it did not understand that the link between UI and AI is tenuous. It is probably one of the few comments where it is so glaringly obvious. There are likely a lot more comments around which are generated but which went unnoticed.

This comment is not generated, as the links below are dated after the GPT-3 dataset was scraped.

[0] https://news.ycombinator.com/submitted?id=adolos

[1] https://adolos.substack.com/p/what-i-would-do-with-gpt-3-if-...

dang · on Aug 17, 2020

That story is bogus. See https://news.ycombinator.com/item?id=24165040 and https://news.ycombinator.com/item?id=24063832.

In a way the real story is that people are so eager to believe it that it didn't matter that it was untrue. Like Voltaire's God, if it didn't exist it was necessary to invent it.

amelius · on Aug 17, 2020

Isn't gpt3-generated prose always seeded by some words or sentences? I'm curious what the poster used as seed, and why/if they chose to focus on UI?

Would it be possible to use gpt3 to "beautify" existing prose without changing its meaning? Now that would be useful!

rytill · on Aug 17, 2020

The prompt design likely has as input the HN title, and potentially some other metadata, but not necessarily the first few words of the comment.

shawnz · on Aug 17, 2020

Incredible. People in those threads are accusing the "author" of being GPT-3, and other commenters are chastising them for being so disrespectful as to suggest such a thing.

dang · on Aug 17, 2020

https://news.ycombinator.com/item?id=24165171

shawnz · on Aug 17, 2020

To be clear, I'm not saying that person was wrong to criticize the other post necessarily. I just think it's a testament to the abilities of GPT-3 that it could fool a reasonable person.

dang · on Aug 17, 2020

That would be true if it had actually happened, but it didn't. What happened is that someone faked it to exploit people's desire for such a thing to happen, and then used the claim to get a lot of attention, including getting some journalists to write about it—whether because they actually believed it, or simply because it was such an attention-getter.

arthurcolle · on Aug 17, 2020

https://www.youtube.com/watch?v=TxHITqC5rxE

thewarrior · on Aug 17, 2020

AI is now good enough to generate comments like that so you probably should :)

lazzlazzlazz · on Aug 17, 2020

It's definitely generated. But excellent nonetheless.

Findeton · on Aug 17, 2020

And now we're going to need a turing test captcha.

slavik81 · on Aug 17, 2020

I'd be more worried about bots gaming the voting. I'm perfectly happy to share this website with intelligent machines if they make insightful comments.

Whether odomojuli's post was written by a human, robot or dog is rather immaterial. It's the content of the post that makes it good or bad. It can be evaluated without knowing the author.

jimmySixDOF · on Aug 17, 2020

>I'm perfectly happy to share this website with intelligent machines if they make insightful comments.

Not me. Not without total attribution so I know it's a bot and whose. There is no AI generated text etc without an agenda - implicit or explicit; benign or sinister.

Advancing the conversation is one thing but when u can't tell between a Russian GRU bot let loose to promote trigger words and some 4chan teenager who had a bad day at school then the online forum as a mode of expression is dead.

A 4channer is entitled to their opinion however wrong. A bot acting like it's human needs to be hunted, killed, and erased.

odyssey7 · on Aug 17, 2020

Perhaps this will keep us on our toes. I tend to find something more believable if it's presented in nice prose (the AI's forte) -- which is really not how I should be evaluating information.

stareatgoats · on Aug 17, 2020

And/or add a new bullet to the guidelines: "Please don't assume that a comment is auto generated without solid proof".

polytely · on Aug 17, 2020

Add another bullet next to that: don't submit generated comments.

stavros · on Aug 17, 2020

I wonder if the fact that we now can't trust anything we read is good or bad. Maybe it'll lead to critically thinking about everything.

6gvONxR4sf7o · on Aug 17, 2020

I suspect it's bad. Any cognitive dissonance and disagreement can be hand waved away as bots, rather than approached critically.

Ever since the 2016 foreign-meddling-in-the-election news, I've see people commenting that there must be tons of russian bots/shills/astroturfers/etc in comments where I see genuine disagreement. I'm sure there are both, but I suspect the dismissal of "'people/opinions' that disagree with me aren't real people/opinions" is more common than the actual act of fake commenting.

comboy · on Aug 17, 2020

Could you ever?

stavros · on Aug 17, 2020

I meant in the specific sense of trusting the basic humanity of HN, not the general veracity of the internet at large.

comboy · on Aug 17, 2020

I would want to say that comments should be judged solely on their content, but yes, "redis works for us" is a different message from an account that you know is CTO at some big corp instead of some random one.

Web of trust is inevitable either way. I just hope it won't be owned by any huge company. Most likely it will be practically shared by a few of them, like the Internet or web standards.

After playing some with GPT-3 though I noticed I work a lot of like it though. Unrelated comments about web of trust are a great example.

Also, dude, I see you in almost every comment thread, how do you consume HN? Some clever scripts or lots of tabs and lots of refreshing?

stavros · on Aug 17, 2020

Haha, I think I'm just more likely to comment than anyone else. I just read HN a few times during the day, nothing extravagant.

It might be confirmation bias (or maybe we just like the same things), if you look at my history I only leave a few comments a day.

yamrzou · on Aug 17, 2020

An AI having their own personal qualms on OpenAI's ethics and virtues? I doubt it. It would be hilarious if this was generated.

ajfjrbfbf · on Aug 17, 2020

> feel comfortable saying that the biggest obstacle in progress for AI is UI

Yep.

4k05 · on Aug 17, 2020

Do you have a source for this? Not nitpicking, i actually need it for a project...

BiasRegularizer · on Aug 17, 2020

I believe he's referring to the absurdity of this statement, especially in relation to SpinningUp (it doesn't have any UI as it's a RL library, and the docs site are generated using standard Sphinx doc generator)

hobofan · on Aug 17, 2020

OP might have just clicked through to the main OpenAI site and mixed the UI part in. The bulk of the comment is about documentation and presentation in general and viewing that in the context of the general great UX of OpenAI.

I know that GPT3 is impressive, but I'm not as convinced as some of the other commenters here that it's a generated comment. If a similar comment were posted on a non-AI related post, nobody would bat an eye.

Is there an equivalent to Hanlon's razor for AI? "Don't attribute to an text generation AI, which could be adquately explained by slightly nonsensical speech".

_5659 · on Aug 17, 2020

That's exactly right, I should've been more astute to point out that it was Sphinx in my comment and dwell more on the general demeanor of the documentation with more examples.

dang · on Aug 17, 2020

I got an email asking what's HN's policy on GPT-3 generated comments. I think it's covered by the fact that we don't allow bots on HN (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...). The purpose of the the threads is human conversation.

Obviously there's an infatuation right now with GPT-3. That's normal. If people keep posting these without disclosing them, I imagine there will be two consequences. One (good for HN) is readers scrutinizing comments more closely and raising the bar for what counts as a high-quality comment. The other (bad for HN) is readers accusing each other of posting generated comments.

Accusing other commenters of being bots is not a new phenomenon ("this sounds like it was written by a Markov chain" has long been an internet swipe) but if it gets bigger, we might have to figure something out. But first we should wait for the original wave of novelty to die down.

ipsum2 · on Aug 17, 2020

Ben Barry used to do a lot of the design, looks like he left to start his own firm now: https://nonlinear.co/openai

_5659 · on Aug 17, 2020

Some of you are commenting on the fact that this might be a generated comment. I get that a lot. My sporadic thought feels generative, I get that, fair observation. But please be more kind to other users. I'll work on my own diction to provide more insightful and constructive comments above light praise.

Note: I know this is generated by Sphinx. I'm commenting more on the actual content and their overall work towards presentation. Again, I should be providing more concrete examples to highlight my points.

randomsearch · on Aug 17, 2020

‪It seems that the greatest achievement of GPT-3 will be to destroy HN, as more and more nerds “hilariously” cherry pick generated content and post it, gradually drowning all signal in a swamp of ever so slightly incoherent noise. ‬

kgc · on Aug 17, 2020

This comment made me click the link in hopes of seeing some great UI/UX. But, it's just a doc.