Why is it that in genetic algorithms never seem to be mentioned any more. Are they sub-standard, or just a "higher-level" than typically talked about i.e. you must implement them yourself.
Genetic algorithms are not really an off-the-shelf black box that you can just plug your data into and get results. They take a domain expert to use efficiently, and even then they aren't guaranteed to perform that well. The area that I've encountered where they are most effective is in approximation heuristics for NP-hard problems where you slowly assemble a solution from smaller pieces.
+1 I'd also add that genetic algorithms are for optimization, and can't really be compared with most of the algorithms in that chart. It'd be a sub-level where different optimization techniques for finding model weights, for each type of approach (classification, clustering etc)are compared.
Most (all?) of the algorithms on the chart iteratively optimize an objective. However, most of the objectives are convex or otherwise admit an optimization strategy that performs better than a genetic algorithm.
I believe you are repeating what I said (?). All of the algorithms have different methods of arriving to an objective function and leveraging it's results. Yet, most share the same problem in terms of optimizing it, and yes, most choose other routes.
> They are very prone to getting stuck in local minima.
That's quite a generalization. A GA's tendency to get stuck in local minima can be mitigated by adjusting population size, selection method/size and rate of mutation -- i.e. increase the randomness of the search.
This is not a good generalization. I've usually only seen this issue with optimization problems when:
1) You haven't played with parameters
2) Implementation is not correct (usually the case with genetic algos, since it requires a reasonable amount of domain expertise vs say GD)
What evidence is informing your opinion that genetic algorithms are a bad search algorithm? What makes you say that they are very prone to getting stuck in local minima? Do you think they suffer from local minima more than, say, gradient descent?
I find it odd that Ordinary Least Squares is missing from the map, even though it's probably more popular than all the other methods in that entire map combined.
OLS is a special case of ElasticNet, Lasso, and ridge regression with the regularization parameters set to zero. (The latter two are also special cases of ElasticNet with one of the two regularization parameters set to zero.) In the presence of many predictors or multicollinearity among the predictors, OLS tends to overfit the data and regularized models usually provide better predictions, although OLS still has its place in exploratory data analysis.
To add to simonster's comment [1]: confusingly, OLS is also morally equivalent to what the map calls "SGD regressor" with a squared loss function[2]. It is also nearly equivalent, with lots of caveats and many details aside, to SVR with a linear kernel and practically no regularization.
So yeah, it is confusing. There is a lot of overlap between several disciplines and it's still an emerging field.
Yeah the nomenclature is not very rigorous and there is some overlap depending on how you look at it but, roughly and without being pedantic, the closest in that map would be SGD with a logistic loss function[1].
But HN to me is a way to keep current on what people in tech are talking about. I don't want to live in a bubble. I want to discover new things that smart people think are cool.
Something that keeps bringing me back to HN specifically (over the likes of Reddit, Twitter, etc.) is the sheer intelligence of conversations.
More often than not I'll find myself skimming through the discussion here before exploring the linked material. The reasoning behind why people feel something deserves to be "front paged" and the insights that domain experts offer in the discussion is what (I feel) makes HN valuable. Taking away the brainy aspect behind how the community works would be an interesting experiment, alas one I wouldn't want to see _replace_ what we have today.
"Things smart people think are cool" is nearly an understatement.
This is exactly what people on reddit used to say about reddit six or seven years ago. I have to believe the decline in general quality, and the quality of the front page most of all, in reddit could happen in some form to HN, without vigilance on the part of users and mods.
There seems to be a remarkable difference between what people feel HN should be, and what it actually is. "Quality" seems to be more of a sliding scale which correlates to personal bias, and perhaps, a feeling of alienation brought about by a diverse community, than anything objective.
The community will drive to a common form which is shaped by the community managers (mods and power users). I believe @pg gets that and the people taking over also get. Additionally some users may simple outgrow the community even if the community stays exactly the same.
I think HN gets a lot of things right to foster what I enjoy, which is intelligent conversations about a wide variety of forward looking topics. I think the biggest risk at the moment is spamming the site with what are essentially targeted ads/click bait because of the user base is, but for the most part the front page is decently curated.
Interestingly, as long as the discussions continue to contain valuable — and often educated — insights, I personally think there's still enough value to keep coming back to HN there alone. The subjective quality of links themselves may shift as the community ages, but it's the well-thought-out discussions and insights within them which are arguably often more worthwhile than the linked content itself.
It's worth mentioning that comments which don't seem to add much to the conversation are often down voted lately, which is promising.
Agreed. This used to be the case with Slashdot, but that became a shithole with the new management. Sorry for the language, but as a long-time reader and commenter on Slashdot, I was very bitter when stuff started going downhill.
I've found HN to be a very good alternative. The comments/discussions were a bit iffy to get into at first with some trigger-happy downmods, but overall I think the conversation is very constructive. And there is a dearth of information and helpful things being said.
I would be quite interested in similar (or not so similar) discussion forums for the European Enterprise galaxy. Would you be able to either submit a list of what you find notable, or post some starting points?
Maybe some version of this tool should be used as a to to filter past discussions instead of search? For example, for creation of a portal shout easy to use development tools including discussions?
More likely the retrieved text fron the page itself contains very little of technical interest. If there were a transcript I expect it would fare better.
Great reminder - material in a video is undiscoverable.
Idea: browser extension that notices two things: when you follow links from the HN front page, and whether you upvote that story. If you read an article and don't upvote it, it labels that article "dreck". If you do upvote it, it labels that article non-"dreck". Maybe it has some subtle reminder that you should remember to upvote the article once you're done.
People who use this extension make HN better for themselves (because they're classifying articles according to their tastes as they go along) and they're also making HN better for others (by incentivizing people to upvote good material when they may otherwise have not upvoted).
If you have enough HN karma to downvote, maybe only downvotes count as dreck. Then you're still improving both your own and others' experience.
Yo dawg, I heard you like HN, so I proposed a browser extension that lets you improve HN while you improve HN.
Two general problems with this, and they're common to many content-recommendation / filtering systems.
• Explicit rating actions are only a small part of interactions with a site. Other implicit actions are often far richer in quantity and quality -- time spent on an article, interactions and discussion, the quality of that discussion (see pg's hierarchy of disagreement, for example), and other measures. As Robert Pirsig noted, defining quality is hard.
• Whose ratings you consider matters. The problem of topic and quality drift happens as general interests tend to subvert the initial focus of a site or venue. Those which can retain their initial focus will preserve their nature for a longer period of time, but even that is difficult. Increasingly, my sense is that you want to be somewhat judicious in who you provide an effective moderating voice to, but those who get that voice should be encouraged to use it copiously. Policing the moderators (to avoid collusion and other abuse) becomes a growing concern (see reddit and its recent downvote brigades against /r/technology and /r/worldnews).
regarding the second part, the proposed scheme uses hn's built in control of making users earn a bunch of karma before letting them downvote. I agree that topic drift happens, witness all of the bitcoin related discussion over the past year or so.
So, there are two basic approaches you can make to this:
1. Delegate moderation powers only to a select group of individuals who know and will uphold the site's standards. Effectively: and editorial board.
2. Allow all users to moderate. But score each on how well the result of their moderation adheres to a specified goal -- that is, for a given submission, was it 1) appropriate to the site and 2) did it create high-level engagement? Users might correlate positively or negatively, strongly or weakly. That is: some people will vote up content that's not desirable, and downvote content that is. Others simply can't tell ass from teakettle. In the first case, you simply reverse the sign, in the second, you set a low correlation value. And of course, those who are good and accurate predictors get a higher correlation value.
With the 2nd approach, everyone's "vote" counts, though it may not be as they expected. You've also got to re-normalize moderation against a target goal.
It's more computationally intensive, but I think it might actually be a better way of doing things.
Are people really serious when they talk about the fabled Hacker News of old? What could it possibly have been like? I'm imagining something like zombo.com, only with lower contrast text.
If you read a few rows down on that page, you'll see this:
"For the debate about MS being evil, you can head directly to HN where you'll also find an explanation of what bootstrapping a compiler means."
And that about sums it up. For a while I didn't even create an account because I didn't think I could add anything without sounding stupid compared to everyone else. Now I try to refrain from commenting for...different reasons.
> And that about sums it up. For a while I didn't even create an account because I didn't think I could add anything without sounding stupid compared to everyone else. Now I try to refrain from commenting for...different reasons.
Same here. Though I refrain less.
At some point, I'd like to go and find my first comment on here just to see what got me to make an account.
Slashdot was fun back in the day (Id 64578), so most sites do that progression. I think HN has changed in that general social stories show up more. I do think the weekends are a bit weirder now.
I actually did something like this at some point. I took all the high ranking items, tokenized them to extract features, and ran them through a bayesian classifier to do some filtering. I was just using whatever information was available on the front page and did not do any further analysis with the actual content.
The results were ok. Maybe with a bit more power it could be more useful but the results were still hit and miss and I didn't have a long term strategy for not filtering myself into a bubble other than continuously re-training the model.
As an econometrician I cannot believe how many times he said 'magic'. There is something very wrong when you put things in your model 'because, who knows, it might be helpful' (like he did with host names). Variable selection is a very hard problem and using 'magic' is asking for problems. It is so disappointing to see machine learning, statistics, econometrics deal with similar problems and fail to learn from one another.
I understand this is a toy project, but he is put in a position in which he educates people how to use these methods and gives the wrong impression. The next guy might use this flawed logic while creating a tool for disease prevalence prediction.
To be fair he did explain that he thought that the host name might be indicative of whether or not it would be druck. If he knew exactly how that was the case (and if it was already known that it did have effect), why bother with machine learning? Just write an explicit scoring mechanism.
It did seem to me that this was an interpretation that he came up with after he tired many different pipelines and "flipped all the switches". There are many sources of randomness that warrant using statistical methods. But it feels strange to me to see people use these tools without giving much thought to parameter stability, parameter significance, causality, model selection in general.
That's exactly the recent criticism of 'big data': engineers and others getting correlations they don't understand from all the data they can collect, and attempting to use them for who knows what.
The presentation did a wonderful job of providing a high level introduction into the idea of machine learning but anyone that's strongly interested in ML should pick up some of the books he mentioned.
Yes. So are compilers. And web frameworks. And editors. And memory-management tools. Progress is made by no longer re-implementing and re-inventing the things that many, many people have invented and implemented in the past, and building on their work.
This doesn't mean that there is no value in learning about these things for yourself, but the packaging of knowledge in reusable tools is the only way programming progresses.
Nicely encapsulated doesn't have to imply a black-box implementation, though. I for one would like it if compilers were less black-boxy; ideally, I want to find out why my compiler does a particular thing by investigating its output, querying the API, going through the compilation steps, etc., rather than having to google some StackOverflow answer.
Isn't that the point of tools like scikit-learn? You don't need to know how to code, optimize, etc. all the algorithms, just understand how to use them.
Perhaps, but I feel like if you are trying to use a statistical tool, it would be best to know how it works. Think about if every scientist claimed a discovery when they found a result with a 90% confidence interval. Machine learning (at least in this application) is different because often the consequences are testable, verifiable, but I still think that it's better to know how it works than treat it like a black box.
There is probably a large group that lacks advanced linear algebra and statistics for learning the theory but would still be able to build useful applications using a ML library. I think the video is mainly directed at that group.
What makes you assert he is treating it like a black box? There isn't time in the presentation to go into detail, but actually linear models are inspectable, namely, you can obtain a list of features and how they are weighted. Also, as he said, the scikit-learn documentation is of high quality and explains how the models work. BTW you give an example of scientists, but like he stressed, machine learning as he applied it is a form of engineering.
I didn't even see that at first, although now that I look at it...
I kind of like the term "block box". It takes a black box, and defines it in terms of how it's used, not what it does. It is a block that can be implemented in a certain way. How does it work? Doesn't matter at this level. It's a building block for a differently-focused project. A block box.