Hacker News new | past | comments | ask | show | jobs | submit | more sbpayne's comments login

I find it interesting that you said: “Only an idiot would say the output of a model has anything to do with the moral question that of how things should be.”

My understanding of the article is that people often share this sentiment, but they often believe this property implies that an optimization procedure will only reflect bias rather than amplify it. Then the author goes on to show that this perceived implication is incorrect.

I.e. your algorithm not understanding an existing bias does not mean it won’t amplify bias — it means it won’t care, which is precisely what allows it to amplify the bias. Because the bias exists in the underlying data, your algorithm can discover and overfit to that bias. Without a “regularizer” to control for this, it’s probably a bad idea to think the algorithm does not amplify bias.

So if my understanding of your comment and the author’s post are right, I think you would both agree that at the end of the day: something needs to explicitly control for the moral question of how things should be, because the general optimization procedures we use do/will not. Is that a fair statement?


How is the existence of a model amplifying any bias? The example the author gives is that conditioning a linear model on some "biased" variable gives a more accurate model that predicts large differences between these groups. But the large difference in the groups is right there in the data. It's not amplified in any way. And then somehow this is the modeller's fault for choosing a naughty variable, as if the output of a model has anything to do with the world the modeller wishes existed.

The author's beef should be with the idiots who would take this model and then say "yep looks like since these two groups have different regression lines it's totally great that we see large differences between these groups"


I was mostly referring to this bit from the article: “Notice how the model that has the lowest mean squared error is also the model that causes the most bias between groups for higher skill levels. This increase in bias cannot be blamed merely on the data. It’s the choice of the model that increases this bias which is the responsibility of the algorithm designer.”

My reading of the plots is that the difference between the groups seems to grow beyond what is present in the data. Do you not think this is amplified?

As for conditioning on a biased variable; this post uses a contrived example for sure. But the same thing happens with variables that correlate with the “naughty” variable (perhaps author should have explicitly showed this to drive the point?)

Removing all variables that correlate with “naughty variables” is a pretty difficult task itself without having something to detect the correlation and tell you. At least that’s been my experience.


> This increase in bias cannot be blamed merely on the data.

This is debatable in this case. The grouped model might be a better fit to the data: it indicates a higher bias in salaries at higher skill levels.

It sounds completely plausible that the gender bias for unskilled workers is less than for high skilled workers.

Models can be under-fit as well as over fit. Just using a dummy variable for gender might not properly capture interactions like this.

But of course it’s always possible that this is just an artefact of the training data.


I didn’t realize this is what I was looking for all my life


Did anyone one else read "SHA-RNN" as "SHHHAAAAARRRROOOOONNN" in Ozzy's voice?


Really want to put a +1 on this. I thought it was an excellent, practical treatment of many distributed systems concepts.


I'm 25 now, so can't really answer the advice question. As for what I'm doing:

* Just ended a 5 year relationship that I felt was ultimately going to hold me back in life * Saying yes to more things that scare the hell out of me -- those things are often the most fun (a way to beat the anxiety out of me _shrug_) * Working as a machine learning tech lead at a large tech company and considering the possibility of going into management soonish * Working on being really intentional with my time and who I spend it with * Working on health in all aspects (physical, mental, emotional, and financial)


working on being intentional with your time is so important!

i find if i don't actively try to be intentional, i become too reactive and time melts away -- i'll do some things i want to do, but also a decent number of things i never did.


The open source alternatives you list seem to only provide experimentation logging. ML Flow seems to support more (such as model deployment).

Not to claim that the deployment processes are _good_, just that MLFlow seems more general than these open source alternatives listed here.


I can second the recommendation of notion. It's worked very well for my organization / productivity needs.

However, I work at a large company that has adopted Confluence, so I'm stuck with that for organizational knowledge.


YES. 100x yes to this. I started a new job some time ago where Java / Scala were mainly used (prior C++ background).

I can logically step through flow control and figure most things out. Except the amount of "magic" for the sake of reducing boiler plate made debugging some issues really tricky.

Anytime I'm about to reduce boiler plate somehow, I try to ask myself whether it's going to introduce some type of tribal knowledge. That is, will someone without any knowledge of e.g. this codegen be able to step through and understand what's going on.

I feel that people reduce boiler plate without constraints far too often


Woebot is not for talk therapy; it applies cognitive behavioral therapy. Since it's a skill based approach, I don't think it's crazy to think a chat bot could effectively approximate that type of therapy. It's much less open ended than talk therapy.

I don't think we'll see effective talk therapy chat bots for quite some time.


I took courses with Eppstein at UCI. I was always impressed with his ability to construct clear example diagrams to explain any question.

Glad to see some of these great diagrams were shared with a wider audience :)


He's a legend, I remember in our Graph Algorithm's class our "text book" was mostly composed of the articles that he had written. Glad to see UCI representing here.


The UCI campus ACM club is quite distinguished as well ;)


He's brilliant. After learning he contributed to a massive amount of CS and Math articles on Wikipedia, Wikipedia has become my goto for those types of articles. Before that, I did not know wether or not to trust academic articles on Wikipedia.


I wanted to take 161 with him at UCI. I heard he was great. Unfortunately I had to take it with Hirschberg (who has his picture on the Wikipedia page).. I still get PTSD from those all or nothing multi-part algorithm problems.


I took the advanced data structures course. That guy is chill.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: