Hacker News new | past | comments | ask | show | jobs | submit login
Spinning Up in Deep RL (spinningup.openai.com)
205 points by samrohn on Aug 17, 2020 | hide | past | favorite | 50 comments



Plug for the RL specialization out of the University of Alberta, hosted on coursera: https://www.coursera.org/specializations/reinforcement-learn... All courses in the specialization are free to audit.

For those unaware, the university of Alberta is Rich Sutton's home institution, and he approves of and promotes the course.


Currently on course 2/4 in the series and it's great. Every week starts with a reading assignment from the RL book followed by a series of videos (re-)explaining stuff. The videos themselves are very nicely structured, with clear outlook and summary at the start and end of them. Sutton himself appears in a couple of videos and there are other great guest lectures with interesting insights about applications.

Definitely a recommendation!


If you are ever interested in the topic of RL, but wish to start learning the concepts on simpler algorithms and keep the "deep" part for later, I maintain a library that has most of the same design goals:

https://github.com/Svalorzen/AI-Toolbox

Each algorithm is extensively commented, self-contained (aside from general utilities), and the interfaces are as similar as I could make them be. One of my goals is specifically to help people try out simple algorithms so they can inspect and understand what is happening, before trying out more powerful but less transparent algorithms.

I'd be happy to receive feedback on accessibility, presentation, docs or even more algorithms that you'd like to see implemented (or even general questions on how things work).


Asking for the benefit of me and others since this is on the front page now - are there any resources this comprehensive for any other field of study? This guide is amazing and I've failed to find anything else like it. I was specifically interested in biotech (from the perspective of a software developer, i.e. practically zero biology background), but will take what I can get


> Are there any resources this comprehensive for any other field of study? ... I was specifically interested in biotech

I recommend the FastAI course on deep learning. Several of their lectures relate to things their students have done in biotech and medical. The main lecturer Jeremy Howard has worked for years at the crossroads of medicinal technology and AI, and routinely discusses this.

The full fastai course is here[1] and free. Here is a blog post and associated video[2] as an example of fastai incorporating biotech into their work. In this example they use AI to upsample the resolution and quality of microscopes.

[1] https://www.fast.ai/

[2] https://www.fast.ai/2019/05/03/decrappify/


If you want to play around with Spinning Up in a Docker container, then make sure you git clone the repository, then pip install -e repository. For whatever reason, if you try to directly install it with pip, it doesn't work, at least last time I tried it. Here's a Dockerfile and docker-compose.yaml I created some time ago: https://github.com/joosephook/spinningup-dockerfile


Can anyone recommend some less opinionated introductory resources to learn reinforcement learning that focus on first principles?


I would highly recommend Sergey Levine's course:

http://rail.eecs.berkeley.edu/deeprlcourse/

For a more mathematical treatment, there's a beautiful book by Puterman:

https://www.amazon.com/Markov-Decision-Processes-Stochastic-...



RL, including contextual bandits, is becoming more popular for personalization, i.e. adapting some system to the preferences of (groups of) individuals.

Plug/Source: I did a lit. review on this topic https://doi.org/10.3233/DS-200028


I enormously appreciate the resources OpenAI provides to start out in DRL such as this one. However, OpenAI has (purposely?) left out the brittleness of their algorithms to parameter choice and code-level optimizations [1] in the past. As a researcher myself, I would be more than surprised to hear that OpenAI did not explore this behaviour themselves. Instead, my guess would be that these "inconveniences" would do harm to the Marketing of OpenAI and its algos. Such deeds are far more harmful to proper understanding of DRL and applications than a nice UI is beneficial imo.

[1]https://gradientscience.org/policy_gradients_pt1/


There was a discussion on r/datascience this weekend about if anyone uses RL. Almost no one does.

https://www.reddit.com/r/datascience/comments/iav3lv/how_oft...


Why would you ask data scientists about RL? I bet no one there uses deep learning either. That does not mean much.


"Pray, who is the candidate's tailor?" -Hilbert

Who is responsible for OpenAI's UI/UX design? It is immaculate and should be the standard for the community. I'm always dazzled by the impeccable standards of OpenAI with regards to tone, presentation, accessibility.

The documentation is both familiar but distinct, an impressive achievement!

I have my own personal qualms on OpenAI's ethics and virtues but am nevertheless impressed by their aesthetics and regard for their publicity. It's always delightful to look at their work.

OpenAI has in my opinion, the most appropriate presentation for their ideas with marketing and branding. It feels exquisitely simple to grasp what goes on here.

I feel comfortable saying that the biggest obstacle in progress for AI is UI but projects such as this give me hope.


I assume this comment is generated? The link is a standard Sphinx doc.


If that comment is generated, I will quit my current job and work full time on AI. I don't believe it.


All the blogs posted by e.g. this user [0] were generated by GPT-3. [1] Some of those reached the top of HN.

That comment indeed looks a lot like it is generated. It has correlated a bunch of words, but it did not understand that the link between UI and AI is tenuous. It is probably one of the few comments where it is so glaringly obvious. There are likely a lot more comments around which are generated but which went unnoticed.

This comment is not generated, as the links below are dated after the GPT-3 dataset was scraped.

[0] https://news.ycombinator.com/submitted?id=adolos

[1] https://adolos.substack.com/p/what-i-would-do-with-gpt-3-if-...


That story is bogus. See https://news.ycombinator.com/item?id=24165040 and https://news.ycombinator.com/item?id=24063832.

In a way the real story is that people are so eager to believe it that it didn't matter that it was untrue. Like Voltaire's God, if it didn't exist it was necessary to invent it.


Isn't gpt3-generated prose always seeded by some words or sentences? I'm curious what the poster used as seed, and why/if they chose to focus on UI?

Would it be possible to use gpt3 to "beautify" existing prose without changing its meaning? Now that would be useful!


The prompt design likely has as input the HN title, and potentially some other metadata, but not necessarily the first few words of the comment.


Incredible. People in those threads are accusing the "author" of being GPT-3, and other commenters are chastising them for being so disrespectful as to suggest such a thing.



To be clear, I'm not saying that person was wrong to criticize the other post necessarily. I just think it's a testament to the abilities of GPT-3 that it could fool a reasonable person.


That would be true if it had actually happened, but it didn't. What happened is that someone faked it to exploit people's desire for such a thing to happen, and then used the claim to get a lot of attention, including getting some journalists to write about it—whether because they actually believed it, or simply because it was such an attention-getter.



AI is now good enough to generate comments like that so you probably should :)


It's definitely generated. But excellent nonetheless.


And now we're going to need a turing test captcha.


I'd be more worried about bots gaming the voting. I'm perfectly happy to share this website with intelligent machines if they make insightful comments.

Whether odomojuli's post was written by a human, robot or dog is rather immaterial. It's the content of the post that makes it good or bad. It can be evaluated without knowing the author.


>I'm perfectly happy to share this website with intelligent machines if they make insightful comments.

Not me. Not without total attribution so I know it's a bot and whose. There is no AI generated text etc without an agenda - implicit or explicit; benign or sinister.

Advancing the conversation is one thing but when u can't tell between a Russian GRU bot let loose to promote trigger words and some 4chan teenager who had a bad day at school then the online forum as a mode of expression is dead.

A 4channer is entitled to their opinion however wrong. A bot acting like it's human needs to be hunted, killed, and erased.


Perhaps this will keep us on our toes. I tend to find something more believable if it's presented in nice prose (the AI's forte) -- which is really not how I should be evaluating information.


And/or add a new bullet to the guidelines: "Please don't assume that a comment is auto generated without solid proof".


Add another bullet next to that: don't submit generated comments.


I wonder if the fact that we now can't trust anything we read is good or bad. Maybe it'll lead to critically thinking about everything.


I suspect it's bad. Any cognitive dissonance and disagreement can be hand waved away as bots, rather than approached critically.

Ever since the 2016 foreign-meddling-in-the-election news, I've see people commenting that there must be tons of russian bots/shills/astroturfers/etc in comments where I see genuine disagreement. I'm sure there are both, but I suspect the dismissal of "'people/opinions' that disagree with me aren't real people/opinions" is more common than the actual act of fake commenting.


Could you ever?


I meant in the specific sense of trusting the basic humanity of HN, not the general veracity of the internet at large.


I would want to say that comments should be judged solely on their content, but yes, "redis works for us" is a different message from an account that you know is CTO at some big corp instead of some random one.

Web of trust is inevitable either way. I just hope it won't be owned by any huge company. Most likely it will be practically shared by a few of them, like the Internet or web standards.

After playing some with GPT-3 though I noticed I work a lot of like it though. Unrelated comments about web of trust are a great example.

Also, dude, I see you in almost every comment thread, how do you consume HN? Some clever scripts or lots of tabs and lots of refreshing?


Haha, I think I'm just more likely to comment than anyone else. I just read HN a few times during the day, nothing extravagant.

It might be confirmation bias (or maybe we just like the same things), if you look at my history I only leave a few comments a day.


An AI having their own personal qualms on OpenAI's ethics and virtues? I doubt it. It would be hilarious if this was generated.


> feel comfortable saying that the biggest obstacle in progress for AI is UI

Yep.


Do you have a source for this? Not nitpicking, i actually need it for a project...


I believe he's referring to the absurdity of this statement, especially in relation to SpinningUp (it doesn't have any UI as it's a RL library, and the docs site are generated using standard Sphinx doc generator)


OP might have just clicked through to the main OpenAI site and mixed the UI part in. The bulk of the comment is about documentation and presentation in general and viewing that in the context of the general great UX of OpenAI.

I know that GPT3 is impressive, but I'm not as convinced as some of the other commenters here that it's a generated comment. If a similar comment were posted on a non-AI related post, nobody would bat an eye.

Is there an equivalent to Hanlon's razor for AI? "Don't attribute to an text generation AI, which could be adquately explained by slightly nonsensical speech".


That's exactly right, I should've been more astute to point out that it was Sphinx in my comment and dwell more on the general demeanor of the documentation with more examples.


I got an email asking what's HN's policy on GPT-3 generated comments. I think it's covered by the fact that we don't allow bots on HN (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...). The purpose of the the threads is human conversation.

Obviously there's an infatuation right now with GPT-3. That's normal. If people keep posting these without disclosing them, I imagine there will be two consequences. One (good for HN) is readers scrutinizing comments more closely and raising the bar for what counts as a high-quality comment. The other (bad for HN) is readers accusing each other of posting generated comments.

Accusing other commenters of being bots is not a new phenomenon ("this sounds like it was written by a Markov chain" has long been an internet swipe) but if it gets bigger, we might have to figure something out. But first we should wait for the original wave of novelty to die down.


Ben Barry used to do a lot of the design, looks like he left to start his own firm now: https://nonlinear.co/openai


Some of you are commenting on the fact that this might be a generated comment. I get that a lot. My sporadic thought feels generative, I get that, fair observation. But please be more kind to other users. I'll work on my own diction to provide more insightful and constructive comments above light praise.

Note: I know this is generated by Sphinx. I'm commenting more on the actual content and their overall work towards presentation. Again, I should be providing more concrete examples to highlight my points.


‪It seems that the greatest achievement of GPT-3 will be to destroy HN, as more and more nerds “hilariously” cherry pick generated content and post it, gradually drowning all signal in a swamp of ever so slightly incoherent noise. ‬


This comment made me click the link in hopes of seeing some great UI/UX. But, it's just a doc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: