Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: The HN submission race made visual
67 points by jacquesm on Jan 25, 2010 | hide | past | favorite | 39 comments
A lot of the people that frequent HN have remarked on the high techcrunch content and other frequently submitted sites.

The reason for this is that there is a relatively small group of people that all 'race' to get articles from these outlets submitted.

The first one to make it is the one that will get points from all the other submitters, mostly because in their haste to be 'first' they forget to check whether the link has already been submitted.

It's the HN equivalent of the /. meme of 'first post', only with a karma boost as an incentive.

This leads to lots of borderline articles getting lots of time on the homepage, which in turn is a small but persistent factor in crowing out the more interesting stuff.

To protect the guilty and the innocent alike I've removed the usernames from the following report, the sort was by number of points per domain per user, so every line reflects a single individual submitting a certain domain.

So, for instance the first individual has submitted 716(!) links from the same domain (and first!).




It would be a cool feature for HN if pg made it so that the number of times a domain is submitted directly correlates to it's decay rate. Thus, articles from really popular sites would fall off the front page faster, opening up spots for articles from newer (to us) domains.


That's a really good plan.

It is very tricky to do such things without introducing subtle feedback loops though.


That initial minor boost is enough to throw a submission high up the frontpage - even 2 points, given close enough together in time, can do it.

simple solution: remove the automatic upvote given when an article is submitted that has already been submitted. If people want to upvote it, it's negligible effort to do so once they get to the story on HN. bonus submitting an article is a quick hack to find it on HN. Often one would also want to upvote the article, but not always e.g. one might be seeking the comments to help evaluate the article. Counting these "submissions" as "upvotes" is inaccurate.

summary: auto-upvoting of submissions is a needless, inaccurate and distorting convenience.


The submitter's vote isn't counted in the algorithm as far as I know. The original algorithm (since chanced) I believe is (p - 1) / (t + 2)^1.5 with p being the number of points. So you can see the initial vote is subtracted.


It isn't with the first one, but it is with all subsequent submissions.


I wrote a scrappy script to process your report and find the "best" domains to consistently get "first post" on by multiplying total submissions by average points received per submission (i.e. total karma benefit). I'll ignore any sites that got fewer than 300 submissions. Result:

  techcrunch.com       - 22229
  nytimes.com          - 10178
  readwriteweb.com     - 2482
  thestandard.com      - 1888
  centernetworks.com   - 1510
  technologizer.com    - 1470
  alleyinsider.com     - 1158
  sfgate.com           - 1081
  treehugger.com       - 708
  itworld.com          - 698
  devcentral.f5.com    - 658
  markevanstech.com    - 523
  howtoforge.com       - 487
  news.com.com         - 431
TechCrunch is by far in the lead. But why not? They publish good stuff (usually) that HN readers like to vote up. Same for all the others in the list too.


"... TechCrunch is by far in the lead. But why not? They publish good stuff ..."

Articles from TechCrunch span the topical or breaking news spectrum. On rare occasions, there are posts that really should make the headlines. I'm thinking of some posts Arrington made last year. Often I've tried reading TC articles a month or a week later and there is no substance to them. Marshmallow news. Not my cup of tea.


Not your cup of tea and that's cool but.. a lot of people like ephemeral news. TC posts aren't essays that'll make good book material, but in terms of a cutting insight or an exclusive scoop that matters to a lot of people right now, TC is pretty good.

Also, this site is called Hacker News, rather than Hacker Essays or Hacker Articles. While I also prefer the essays, articles, and deep blog posts, I also couldn't say that TC stuff isn't relevant here because it is usually "news."


"... a lot of people like ephemeral news. TC posts aren't essays that'll make good book material, but in terms of a cutting insight or an exclusive scoop that matters to a lot of people right now, TC is pretty good. ..."

I'm not really disagreeing but I'm wary of posts from TC & a few others because the posts tend towards sensation, scoops and controversy before fact.



I've put the list in a separate comment because the submission box really doesn't like this list for some reason.

   +------+----------------------------+-------------+---------------------+----------+                                                                                        
   | s    | domain                     | submissions | pointspersubmission | redacted |                                                                                        
   +------+----------------------------+-------------+---------------------+----------+                                                                                        
   | 6234 | techcrunch.com             |         716 |              8.7067 | redacted |                                                                                        
   | 4453 | nytimes.com                |         628 |              7.0908 | redacted |                                                                                        
   | 3853 | ycombinator.com            |          42 |             91.7381 | redacted |                                                                                        
   | 2519 | paulgraham.com             |          16 |            157.4375 | redacted |                                                                                        
   | 2506 | techcrunch.com             |         206 |             12.1650 | redacted |                                                                                        
   | 1971 | techcrunch.com             |         363 |              5.4298 | redacted |                                                                                        
   | 1888 | thestandard.com            |         446 |              4.2332 | redacted |                                                                                        
   | 1692 | techcrunch.com             |         358 |              4.7263 | redacted |                                                                                        
   | 1471 | technologizer.com          |         543 |              2.7090 | redacted |                                                                                        
   | 1347 | sivers.org                 |          23 |             58.5652 | redacted |                                                                                        
   | 1315 | nytimes.com                |         139 |              9.4604 | redacted |                                                                                        
   | 1308 | techcrunch.com             |         207 |              6.3188 | redacted |                                                                                        
   | 1180 | jgc.org                    |          59 |             20.0000 | redacted |                                                                                        
   | 1178 | techcrunch.com             |         102 |             11.5490 | redacted |                                                                                        
   | 1158 | alleyinsider.com           |         413 |              2.8039 | redacted |                                                                                        
   | 1120 | catonmat.net               |          26 |             43.0769 | redacted |                                                                                        
   | 1090 | techcrunch.com             |         232 |              4.6983 | redacted |                                                                                        
   | 1082 | sfgate.com                 |         386 |              2.8031 | redacted |                                                                                        
   | 1076 | codinghorror.com           |          55 |             19.5636 | redacted |                                                                                        
   | 1036 | techcrunch.com             |         141 |              7.3475 | redacted |                                                                                        
   |  983 | singularityhub.com         |          83 |             11.8434 | redacted |                                                                                        
   |  921 | sethgodin.typepad.com      |          69 |             13.3478 | redacted |                                                                                        
   |  886 | nytimes.com                |          83 |             10.6747 | redacted |                                                                                        
   |  844 | paulbuchheit.blogspot.com  |          19 |             44.4211 | redacted |                                                                                        
   |  822 | gabrielweinberg.com        |          36 |             22.8333 | redacted |                                                                                        
   |  799 | zedshaw.com                |          14 |             57.0714 | redacted |                                                                                        
   |  784 | danieltenner.com           |           8 |             98.0000 | redacted |                                                                                        
   |  775 | centernetworks.com         |         195 |              3.9744 | redacted |                                                                                        
   |  757 | wired.com                  |         123 |              6.1545 | redacted |                                                                                        
   |  722 | antoniocangiano.com        |          95 |              7.6000 | redacted |                                                                                        
   |  718 | daemonology.net            |          22 |             32.6364 | redacted |                                                                                        
   |  708 | treehugger.com             |         280 |              2.5286 | redacted |                                                                                        
   |  699 | itworld.com                |         222 |              3.1486 | redacted |                                                                                        
   |  694 | readwriteweb.com           |         173 |              4.0116 | redacted |                                                                                        
   |  692 | igvita.com                 |          42 |             16.4762 | redacted |                                                                                        
   |  689 | wired.com                  |          45 |             15.3111 | redacted |                                                                                        
   |  685 | googleblog.blogspot.com    |          89 |              7.6966 | redacted |                                                                                        
   |  683 | whattofix.com              |          84 |              8.1310 | redacted |                                                                                        
   |  671 | asserttrue.blogspot.com    |         100 |              6.7100 | redacted |                                                                                        
   |  659 | devcentral.f5.com          |         303 |              2.1749 | redacted |                                                                                        
   |  652 | codinghorror.com           |          21 |             31.0476 | redacted |                                                                                        
   |  650 | 37signals.com              |          48 |             13.5417 | redacted |                                                                                        
   |  646 | david.weebly.com           |          21 |             30.7619 | redacted |                                                                                        
   |  637 | inc.com                    |           5 |            127.4000 | redacted |                                                                                        
   |  637 | joelonsoftware.com         |          13 |             49.0000 | redacted |                                                                                        
   |  632 | centernetworks.com         |         102 |              6.1961 | redacted |                                                                                        
   |  616 | andrewchen.typepad.com     |          46 |             13.3913 | redacted |                                                                                        
   |  615 | techcrunch.com             |          40 |             15.3750 | redacted |                                                                                        
   |  613 | techcrunch.com             |         120 |              5.1083 | redacted |                                                                                        
   |  611 | ajaxian.com                |         199 |              3.0704 | redacted |                                                                                        
   |  610 | steve-yegge.blogspot.com   |           7 |             87.1429 | redacted |                                                                                        
   |  599 | slash7.com                 |           9 |             66.5556 | redacted |                                                                                        
   |  592 | readwriteweb.com           |         157 |              3.7707 | redacted |                                                                                        
   |  587 | tom.preston-werner.com     |           5 |            117.4000 | redacted |                                                                                        
   |  586 | readwriteweb.com           |         107 |              5.4766 | redacted |                                                                                        
   |  553 | readwriteweb.com           |         130 |              4.2538 | redacted |                                                                                        
   |  539 | xconomy.com                |          94 |              5.7340 | redacted |                                                                                        
   |  538 | boston.com                 |          57 |              9.4386 | redacted |
   |  538 | googleblog.blogspot.com    |           8 |             67.2500 | redacted |
   |  526 | blog.asmartbear.com        |           8 |             65.7500 | redacted |
   |  523 | markevanstech.com          |         207 |              2.5266 | redacted |
   |  519 | datacenterknowledge.com    |          91 |              5.7033 | redacted |
   |  515 | economist.com              |          51 |             10.0980 | redacted |
   |  514 | 37signals.com              |          24 |             21.4167 | redacted |
   |  507 | infoworld.com              |         164 |              3.0915 | redacted |
   |  490 | linux-mag.com              |          90 |              5.4444 | redacted |
   |  487 | howtoforge.com             |         228 |              2.1360 | redacted |
   |  486 | esciencenews.com           |         103 |              4.7184 | redacted |
   |  484 | businessinsider.com        |         159 |              3.0440 | redacted |
   |  482 | particletree.com           |          15 |             32.1333 | redacted |
   |  474 | techcrunch.com             |          30 |             15.8000 | redacted |
   |  465 | github.com                 |          17 |             27.3529 | redacted |
   |  463 | paulgraham.com             |           3 |            154.3333 | redacted |
   |  462 | scripting.com              |         118 |              3.9153 | redacted |
   |  461 | mattmaroon.com             |          10 |             46.1000 | redacted |
   |  454 | nytimes.com                |          64 |              7.0938 | redacted |
   |  452 | 37signals.com              |          17 |             26.5882 | redacted |
   |  449 | tipjoys2cents.blogspot.com |          13 |             34.5385 | redacted |
   |  446 | blogs.zdnet.com            |         162 |              2.7531 | redacted |
   |  444 | blog.last.fm               |           2 |            222.0000 | redacted |
   |  440 | nytimes.com                |          37 |             11.8919 | redacted |
   |  435 | mattmazur.com              |          15 |             29.0000 | redacted |
   |  432 | blogs.harvardbusiness.org  |          47 |              9.1915 | redacted |
   |  431 | news.com.com               |         236 |              1.8263 | redacted |
   |  430 | arstechnica.com            |          77 |              5.5844 | redacted |
   |  426 | bits.blogs.nytimes.com     |          72 |              5.9167 | redacted |
   |  422 | redeye.firstround.com      |          41 |             10.2927 | redacted |
   |  422 | paulstamatiou.com          |          25 |             16.8800 | redacted |
   |  421 | 25hoursaday.com            |          48 |              8.7708 | redacted |
   |  420 | venturebeat.com            |         166 |              2.5301 | redacted |
   |  416 | howtosplitanatom.com       |          57 |              7.2982 | redacted |
   |  415 | rondam.blogspot.com        |          17 |             24.4118 | redacted |
   |  415 | foundread.com              |         135 |              3.0741 | redacted |
   |  414 | valleywag.com              |         161 |              2.5714 | redacted |
   |  410 | nytimes.com                |          35 |             11.7143 | redacted |
   |  408 | reynoldsftw.com            |          88 |              4.6364 | redacted |
   |  406 | economist.com              |          92 |              4.4130 | redacted |
   |  404 | nytimes.com                |          91 |              4.4396 | redacted |
   |  397 | nytimes.com                |          37 |             10.7297 | redacted |
   |  396 | adam.blog.heroku.com       |          17 |             23.2941 | redacted |
   +------+----------------------------+-------------+---------------------+----------+


This is also a good argument for splitting the submission votes from the comment votes. You can probably rack up decent submission points with a halfway intelligent script, which a little bit harder in the case of commenting. It might reflect my own use of the site but I also see HN as more of a commenting place than a link-finding place. Most of the (non-meta) links that end up on HN are already making the rounds elsewhere. The chatter can still be interesting though, even when hanging off some insipid TC post.


Can you reveal where I rank of the nytimes rows? I'm looking through face palms hoping I'm not the 628 submissions. But it wouldn't surprise me either...it's my daily morning read for over a decade and I've been coming around here for about three years.

Is that an excuse? :)

In my defense, I can't say I've ever had a "First!" urge. I assume it's just because I'm early to rise and on the East Coast.


So much for anonymization then. You're spot on.

Kudos :)

btw I really liked that one: http://news.ycombinator.com/item?id=1075857

thanks!


But I knew! So embarrassing...ok, not really. Still...I'll take solace in the fact that it averages out to 1 link every other day. Considering I usually read about 20 stories a morning there, 1/40 ain't so bad.

Ugh.

Feel free to out me elsewhere...I think I may be boston.com and economist too.


> Still...I'll take solace in the fact that it averages out to 1 link every other day.

No solace need be taken! One post every other day by someone actually reading the paper in depth is a great service to the community. So, thanks and cheers. I come here for my science/tech/business fix and comment, I wish I found more interesting links to submit!


Errr, the list is rather (a lot) longer.

But don't worry about it, as long as it is on topic no harm done.

The problem is with the off-topic stuff that gets submitted like this.


Well, I don't know about on-topic, but they were always interesting to me. I take solace in the fact that I've never read valleywag.

Did you scrape for the data or it it available somewhere as a download? For a while I've been curious about creating a more interactive approach to my links archive based on the content of each link.


I intend to release all the data on HN that I've collected over the last few months, I still plan to write one more article about this (about trending topics) and then I will make all the data available for download as a mysql dump.


Damn. You must read very quickly.

Or I spend way too much time in my programming bubble.


my first thought was that some of these could well be innocent "click bookmarklet" first thing in the morning.


Yeah, the bookmarklet made it very easy to submit with one hand on the mouse and the other on the coffee.


Early bird gets the worm.

That's why I wrote 'to protect the guilty and innocent alike'.

I figured that some of these are accidental and some are purposeful.


It would be interesting to show what percentage of a person's total rep is derived from their repeated postings from one site.

Also, I think you should show the names. I'm sure I'm in the list, and I want to know where :) Especially since I don't really post links like I used to (and I've always tried to only post quality links, no matter the source).


For the top poster that's roughly 30%.

And no, you're not in the top 100 in this case.

You're actually much more selective, you've only posted 4 techcrunch links that I'm aware of and scored 24.75 points on average with those.

That's about 3 times as good as the 'top' tc submitter so you are three times better at spotting what to submit and what not.

Showing that is exactly the point of this whole exercise.

If you blindly submit every tc article you'll score lots of points, but if you are selective you get less points but average much better.

The 'bad' behaviour is rewarded though, the absolute number of points is not to be compared, 6000+ for the top submitter and only 100 for you.


I used to blind submit a lot of Joel on Software, Coding Horror (I feel slightly responsible for a lot of his posts on the site since there weren't many before I started posting them. I've stopped and I'm sorry.) and 37signals. So I was sort of expecting to be in the list for those. I must not have had the volume.

I would love to see my complete stats, but I'm just a stats junkie (hence my love for rep-based communities and sports).


How about a CSV or Google Doc version so we can play with these numbers a bit? (filter, sort, etc)


Can you tell us how many different posters this covers? The list length seems arbitrary.


71 individual accounts, but 100 entries, so some individuals are on the list more than once.


Very interesting, is it possible to check quickly if the points follow Benford's law? http://en.wikipedia.org/wiki/Benfords_law as a first approximation to see if people are gaming HN?


Is that how it Benfords law works ?

I thought that Benfords law would allow you to distinguish between a set of points that is made up and a real one, not between a real set made by real users and one by real users + users gaming the system.


I bet I'm somewhere high for Techcrunch(possibly #1...although 716 seems high), I used to submit a ton of their stuff. Don't do that as often, since I'm busy with my own site now.

edit: apparently it's not me, since searchyc only shows 91 submissions...and a few of them are "Ask HN" types

edit #2: apparently it IS me, and searchyc just doesn't work that well. (thanks for the email Jacques). I'm actually surprised that only accounted for 6K votes, I figured it'd be closer to 15K


So much for the accuracy of searchyc then.


maybe I did something wrong

what I did was search for vaksel as a user, and then hit within results and entered techcrunch.


Your constant submission of TechCrunch led me to believe you were on their payroll. Yikes! I can't believe you were submitting them out of your own free will.

Glad you stopped though; TC is to launching what reviewing combat video games is to real combat.


You know, after reading this, it would be interesting to compile a list of the top averaging websites as sorted by points on HN. That would be a neat way to find new sites that we collectively find interesting.


You mean, like this?: http://top.searchyc.com/


I can only find a few that have more than 10 submissions:

dustincurtis.com (dcurtis)

danieltenner.com (wheels iirc)

blog.last.fm

balsamiq.com

sivers.org

netflixprize.com

steve-yegge.blogspot.com

Most of the other ones are either mainstream or have very few submissions.

If you want the list of one-offs and infrequent but high ranking sites let me know.


I've briefly considered writing a bot to just submit everything from reddit to HN.

And then I realized that would make y'all cry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: