Similar Hacker News Users

edw519 · on Jan 12, 2010

I love it! My results were very encouraging:

  S W G W                      einstein

                               newton
  +------------------------+ 
  | edw519                 |   liebniz
  +------------------------+ 
                               turing
  ===========||============= 
                               carnegie
  Co-Commenting..SEMANTICS.. 
  Word Choice                  tesla

                               godel
  =====||===================
                               escher
  Leaderboard.....KARMA.....
  Diamonds in the Rough.....   bach

                               edison
  This site has no
  affiliation with Hacker      galois
  News or Y Combinator
	                       patio11

bugs · on Jan 12, 2010

I think your leaderboard dial was set to 11.

csuper · on Jan 12, 2010

Yeah - he should just make 10 louder.

josefresco · on Jan 12, 2010

I don't understand why ... his goes to 11

brettnak · on Jan 12, 2010

For $2,000 I'll build you one that goes to 12.

riffer · on Jan 11, 2010

People asked for a similar users tool back in the early days of HN:

http://news.ycombinator.com/item?id=701

And recently, too:

http://news.ycombinator.com/item?id=1036247

How does this particular tool work? It's based on the threads a given person comments on, who else comments on those threads, and how the topics and terminology of threads and comments relate to each other. Karma on the relevant threads is used as a subtle authority metric. Incorporating voting relationship histories would almost certainly make the tool better. Particularly at finding interesting (and not merely similar) stuff.

That said, it'll be interesting to hear what folks think.

sarosh · on Jan 12, 2010

I commend you on an elegant UI. Would you mind explaining a bit how exactly you anticipate using voting relationship histories in future revisions? Also, am I correct in assuming that there is some sort of distance function involved in computing 'similarity'?

idoh · on Jan 11, 2010

I recently changed my user name from antiismist to idoh. When I moved the karma setting to Diamonds in the Rough, it listed idoh as the #2 most similar user to antiismist. Nice!

http://www.swimwithoutgettingwet.com/hnusers/?user=antiismis...

ambition · on Jan 12, 2010

This is awesome. Having computed a reasonable distance function between users, you should be able to use this distance function as edge weights in a big graph. Rendering this graph with a force-directed layout algorithm like Fruchtermann-Reingold might create visually appealing results by clustering related users.

I've done this before on different datasets and would love to cooperate with you on it...

llimllib · on Jan 12, 2010

excellent choice setting the defaults to give very flattering results... there's a lesson in that.

patio11 · on Jan 12, 2010

I could probably turn this into a blog post (and will later), but there are a lot of microoptimizations of default settings for B2C apps that make huge differences in conversion:

- Defaults should be almost sufficient to use the app. (i.e. if it has some workflow, you should be able to pretty much hit "Next next next" and get it to work. Bonus points for making the number of nexts as low as possible.)

- Pick something that results in visually impressive output rather than a blank page. See Balsamiq for inspiration here -- they start you with a mockup in progress that demonstrates most of the highlights of the software.

- Ever seen Firefly? I really like how they use the word "shiny". Ideally, your defaults should show the shiny in your app. In Hollywood they have a saying: make sure your budget makes it onto the screen. In B2C apps, make sure the stuff you did all the work on makes it into the user experience most of your users will see.

- Assume your user is a novice at both your software and the problem domain until you have evidence otherwise. A lot of people ask the user "Hey, are you a novice?" That is one way to do things, but it makes your core workflow one stage longer and every stage costs you conversion. I prefer "Assume they are and give them a discrete 'skip ahead' button" or "Assume they are and watch them for evidence that they are not".

- If your app is supposed to make the user feel like they just killed an effing lion, then your default settings better have a lion bound and gagged sitting under a forty-ton weight suspended by a weak string which passes through an open pair of scissors next to a sign saying "Snip this."

- (Do this if nothing else.) Track actual usage of the app and modify your defaults based on actual usage. Bonus points if you can do it dynamically, if that makes sense for your app. For example, if you pick A as the default and 25% of your users go out of their way to change it to B, then that probably should have been B. (You can split test and see how many people would have changed it to A if it had defaulted to B.)

llimllib · on Jan 12, 2010

While all of these are true, I find that they usually derive from a simpler rule: Have a laser focus on the users you're trying to reach, and everything you do should make their lives easier in some way.

When I find people breaking these sorts of rules, it's usually because they're thinking of themselves or some non-customer stakeholder.

edited to add a corollary: until you've done the sort of testing patio11 advocates, you don't know who that customer is.

patio11 · on Jan 12, 2010

I think what you are saying is good advice, but don't think it is necessarily co-extensive with what I'm suggesting.

For example, say you're targeting teachers. Keeping a laser focus on what teachers want is important. However, I think you need to put extra focus on making their first five minutes absolutely amazing. (And their first 30 seconds. And their first 5 seconds.) My reason for this is simple: almost all apps are going to leak an amazing number of their customers between first and second use. I don't have my report in front of me at the moment, but I think something like 40% of BCC users never complete their first bingo card and never log in again. Essentially none of these people buy the software. On the other hand, roughly 2.4% of trialers convert, or roughly 4% of users who succeed in their first interaction with the app. Increasing my bottom line by 5% requires converting 5% more of that second group -- which, let me tell you, is hard freaking work -- or, in the alternative, improving the first run experience of three out of every 40 users who fail to complete Task #1.

It is really hard to optimize your entire application, user experience, value proposition, etc to get +5% conversion. However, polishing your first five minutes until it freaking shines is not nearly as difficult. Do you present people with a blank page currently? Spend an hour and put something on it. I will put money on that helping. Does it take a critical mass of friends/input/lions slain to get fun? Do the work for them. Fake it if necessary.

And, yeah, instrument everything. Can I plug Mixpanel here? plug Every time I think "You know I should really build more instrumentation into my app..." I remember "Oh, wait, it takes a twentieth of the time to just throw it on Mixpanel -- no visualization code or complicated controls to refine the data range required, praise be."

TheSOB88 · on Jan 12, 2010

What does instrument mean?

profquail · on Jan 11, 2010

Interestingly, there seems to be a local minimum (or maximum) in your algorithm that I found when searching for myself:

Start here: http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...

The next two 'clicks' to the left (towards co-commenting) don't have 'spolsky' in my list: http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...

But one more click to the left, and he reappears in the list: http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...

riffer · on Jan 12, 2010

This is interesting, thanks for pointing this out. The top result is actually getting filtered out from the displayed results (don't ask). spolsky is going from being the second result, to the first result, to the second result in your examples.

po · on Jan 12, 2010

You keep mentioning filtering… are you sure we can't ask? I'm curious.

alex_c · on Jan 12, 2010

Quite flattering - no matter what I set the sliders to, the names I recognize are people I respect.

I think that only really says something about HN, not about my comments :)

randallsquared · on Jan 12, 2010

I had the same result. Perhaps this is not measuring similarity after all. :)

kf · on Jan 11, 2010

Aww, I was hoping to get amichail.

amichail · on Jan 12, 2010

I think I was filtered out.

riffer · on Jan 12, 2010

You're correct. I feel terrible. There is no question that you're the posterchild for controversial-interesting. I will fix this.

kf · on Jan 12, 2010

LOL! There was an actual amichail filter?

dbz · on Jan 12, 2010

I just realized I have paid no attention at all to anyone's name. I know my brother's name when I see it, but that's it. Now I'm wondering if I am liked. Probably not >.<

chrischen · on Jan 12, 2010

Ditto. I never pay attention to the name either. Usually it takes 30 levels of nesting before I realize that I'm conversing with the same person.

The only name I recognize is pg, and since I save my username and password, sometimes I forget my own username too.

po · on Jan 12, 2010

Your username is chrischen, FYI…

chrischen · on Jan 12, 2010

I actually typed "chris" as my username in the form the first time, wondered why no results showed up, and then realized my username is actually chrischen.

greendestiny · on Jan 12, 2010

Ah rms right there in my list... how did I go so wrong... :)

Zak · on Jan 12, 2010

Does moving the karma slider to the right look for people with low karma, or does it ignore karma? I want the option to ignore karma.

riffer · on Jan 12, 2010

Excellent question. Middle ignores karma in the quantitative ranking. Far to the right gets lower karma users. Users who consistently get almost all 1s and/or negative points are separately filtered out since they generally don't make for interesting recommendations. Users who sometimes have negative comment scores but also make lots of 2+ comments are included (these users are rare but are arguably the most interesting).

apu · on Jan 12, 2010

Based on what other people here are reporting, I wonder if there's a bias towards matches with a large number of comments (e.g., pg, patio11, edw519, etc.). Perhaps there's some normalization needed?

silentbicycle · on Jan 12, 2010

There's a bias towards people on the leaderboard, though one of the knobs adjusts it.

mahmud · on Jan 12, 2010

I was happy to see you in my list, silentbicyle :-)

silentbicycle · on Jan 12, 2010

chaosmachine · on Jan 11, 2010

Based on the default settings, I'm up there with PG and Patio11. I like your algorithm ;)

riffer · on Jan 11, 2010

Everybody here has somewhat similar interests to pg. Otherwise we wouldn't be here. ;)

CrazedGeek · on Jan 11, 2010

I checked a few people, and it seems like patio11 is in all of their results. I wonder why?

riffer · on Jan 11, 2010

A big part of it is that the default settings reward comment karma fairly heavily. Try moving the sliders around. For example, with the defaults, patio11 is the top result for your username, but at the karma slider midpoint he's out of the top 12.

onewland · on Jan 11, 2010

I had the same reaction. A little flattering, but I think it has to do with brevity more than anything.

edit: just realized that by default it's only returning the rockstars of the site (e.g. people high up on the leaderboard).

run4yourlives · on Jan 12, 2010

I'm up there with patio11, but not pg. (I get nostradeamous)

Not sure what that means though.

randallsquared · on Jan 12, 2010

Interestingly, nostrademons and pg are the top two for me, so I wonder what would produce one but not the other. I'd love to see if comments could be separated into [user1]-like comments and [user2]-like comments.

riffer · on Jan 12, 2010

Yeah, one potential solution is lists of users.

Another route might be clustering ('technical', 'political', etc.)

mixmax · on Jan 12, 2010

Nostra who?

nostrademons · on Jan 12, 2010

Me, presumably.

mixmax · on Jan 12, 2010

Obscure pun, it's meant to be flattering.

http://news.ycombinator.com/item?id=181868

:-)

run4yourlives · on Jan 12, 2010

Yeah, sorry about the spelling.

silentbicycle · on Jan 12, 2010

A few weeks ago, I tried searching through old threads to find where PG had an archive of old comments and server stats. Does anybody have link(s)? Thanks in advance. (This post is meta enough that it's probably as good a time as any.)

I'm guessing either such a corpus was used here, or it's based on a cache of recent comments.

brk · on Jan 12, 2010

I got quite a few people I respect (mixmax, edw519, mattmaroon) and one I consider a personal friend, dennykmiu.

Neat little utility, thanks.

mixmax · on Jan 12, 2010

hey thanks!

dolinsky · on Jan 12, 2010

Am I missing something or does this not work for the majority of us who are casual commentators on here?

silentbicycle · on Jan 12, 2010

It probably needs more data points.

riffer · on Jan 12, 2010

Yeah, there's really two issues here.

Most of the techniques for this sort of thing don't work that well for sparse datasets. So given a choice between showing bad results for users with relatively few comments, and not showing any results ... especially when users can search not just for themselves but also for folks they know and enjoy ... Also, scraping the full history of HN is not cool.

silentbicycle · on Jan 12, 2010

A while ago, pg posted an archive of HN comments, specifically so that nobody would have to scrape them. (It was coupled with comments on how the arc server was holding up, IIRC.) I haven't had any luck searching for it, though - there are too many discussions about the ethics of scraping comments, archive file formats, and the like.

slackenerny · on Jan 12, 2010

I remember that too. It was, as you say, a while ago, meaning long time outdated. And I indeed recall it to be pg-posted, but instead all I can find is this non-pg release with all links broken: http://news.ycombinator.com/item?id=173045. I looked for in all links to tar and zip archives, could've missed something.

silentbicycle · on Jan 12, 2010

Yeah, they're 404. Thanks for looking!

pronoiac · on Jan 12, 2010

See boredguy8's comment near the top for a list of five datasets:

http://news.ycombinator.com/item?id=271066

I'd check the Internet Archive for the files & Google for the filenames.

silentbicycle · on Jan 12, 2010

That was the post I was looking for. They're 404 now, but thanks for looking.

mixmax · on Jan 12, 2010

Interesting. After the obligatory vanity search I tried searching for people I know have different (more technical, less startup/strategy(marketing) taste than me. And it seems to work quite well. I have nothing in common with those guys :-)

For instance the top pick for tptacek is cperciva, which seems natural. Doesn't work the other way around though, so there's still some work to be done...

riffer · on Jan 12, 2010

Thanks for the feedback, you bring up a very interesting point: reciprocity. If I am closer to you than anyone else, does it follow that you are closer to me than anybody else? I'm not sure the right answer to that is yes ...

sparky · on Jan 12, 2010

Definitely not in the general case of points in n-dimensional Cartesian space. For instance, in the 1-d case, consider points at 0, 10, and 12. 10 is the closest point to 0, but not vice-versa.

Put another way, it seems as though the algorithm considers tptacek more distinct (from all other HN users) than cperciva.

Xichekolas · on Jan 12, 2010

I was wondering the same thing. I don't show up in the lists of any of the people that show up in my list.

I take it that means I am the cheese.

mixmax · on Jan 12, 2010

which is in tune with tptacek being no. 3 on the leaderboard and cperciva being no.34.

SapphireSun · on Jan 12, 2010

Hmm, small bug(?) When I typed my name in all lowercase it didn't come up. That might be by design though. Cool tool ;-)

riffer · on Jan 12, 2010

That's a good point. Let me see what I can do. The catch is that HN is case sensitive as well.

works: http://news.ycombinator.com/user?id=SapphireSun

doesn't work: http://news.ycombinator.com/user?id=sapphiresun

Vivtek · on Jan 12, 2010

Ha, yes, that confused me, too.

tokenadult · on Jan 12, 2010

I like my company. I don't know if they feel the same way about me.

One newer participant whose posts make me think "I wish I had posted that" doesn't show up on my list of associated participants. But I show up on his. Maybe that is because of the karma setting in the default operation of the search. Interesting.

riffer · on Jan 12, 2010

If you don't mind telling me the user you're thinking of, I can take a look as well.

riffer · on Jan 12, 2010

Thanks for the positive response.

If three things were to get added to this, what should they be?

blasdel · on Jan 12, 2010

Could you provide a reverse search -- who's lists do I show up on?

How about displaying the values used to order the result set, so you could compare weightings across users? reply

adrianwaj · on Jan 12, 2010

- explanation about what it does and how

- username has period afterward, for self-linking back to site

- various ratios next to each user name in a table, maybe linking to searchyc ie http://searchyc.com/user/riffer

kf · on Jan 12, 2010

Least similar users

riffer · on Jan 12, 2010

This is fun to consider. Are you thinking of people who have completely different interests? Or folks who are interested in similar things but disagree?

kf · on Jan 12, 2010

I'm really not sure. I was naively thinking whatever the opposite of the existing algorithm is.

riffer · on Jan 12, 2010

These are the 12 'farthest' from you, rms:

osteele erikstarck zedshaw atarashi matthavener clemesha alrex021 cliff NateLawson jefffoster mrcharles fix3r

This was done with the karma effectively off (same as the karma slider in the middle). What do you think? Anybody there you think shouldn't be?

kf · on Jan 12, 2010

Thanks. Most of the comments of those users are about programming which I don't often discuss here. Erikstarck doesn't really comment in the programming discussions but his comment style is definitely different from mine.

ratsbane · on Jan 12, 2010

1) If I understand correctly the first slider ranks greater similarity in commenting on the same threads the farther left you go and similar word choice the farther right you go. Why not make this two separate sliders?

2) For each of the matches could you show some of the data used to compute the matches... e.g., for semantic matches, show the top X (maybe five?) common words or phrases you matched on. For threads, show the parent or OP of the five most recent threads... something like that.

3) Really neat. I like this. It's quick too. How did you do it? Did you replicate the entire HN database? Third suggestion: post the source or just post an explanation of how it works.

ratsbane · on Jan 12, 2010

... and 4: apply the same thing to twitter (!)

riffer · on Jan 12, 2010

Yeah, that's one of things to consider. All of the logic for this was originally developed for another purpose, and it got applied to HN for fun. The human connection element makes it very cool, and so do the possiblities for applying it to so many different applications.

jacquesm · on Jan 12, 2010

I think there is still a subtle bug in there somewhere.

When I set the sliders 'semantics' all the way to the right ('word choice') and leaderboard all the way to the left, then check I get this:

  - patio11
    - mahmud
    - nostrademons
    - tptacek

  - mahmud
    - edw519
    - swelljoe
    - davidw

Shouldn't the relationships be symmetrical, so 'edw519' would get 'mahmud' as the first match and 'mahmud' would get 'edw519' ?

edit: also, your 'match' is case sensitive, so 'riderofgiraffes' won't work but 'RiderOfGiraffes' does.

astine · on Jan 12, 2010

It filters on karma as well.

All of those guys showup in my default results, but I can't get any of them to pick me up. Clearly I don't post enough.

jacquesm · on Jan 12, 2010

If that were the case then I would show up, and I don't either.

I've received some email from David (the guy that built it), he's going to fix this and the lowercase issue as soon as things quiet down a bit.

ax0n · on Jan 11, 2010

Awesome! I knew SwellJoe and I seemed to end up in the same threads, and IIRC, agree on things. Even though I'm a relative neophyte. I love it!

matt1 · on Jan 12, 2010

Very cool.

My matches using the default settings: pg, swombat, mahmud, wheels, SwellJoe, edw519, mattmaroon, gojomo, davidw, mixmax, unalone, tptacek

Did you get permission to scrape the data? (I tried once without asking with mediocre results: http://www.mattmazur.com/2008/08/the-wrong-way-to-get-notice...)

jacquesm · on Jan 12, 2010

As someone in the comments on your pages also notes, you could use the google cache. But it would still be nicer to ask first.

ramidarigaz · on Jan 12, 2010

Really fun app! Great for boosting one's ego.

Slight note about the page formatting: My screen resolution is 800x480, and the text by the sliders wraps in a very confusing manner.

It looks like this to me

  ===============||===================
  co-commenting......SEMANTICS.....word
  choice

A little confusing at first, until I realized that it was wrapping. It's the same for the other slider.

Great app though!

NathanKP · on Jan 12, 2010

Very interesting. Depending on how I adjusted it I was judged similar to pg, unalone, and a couple other people on the leaderboard. I guess that means my comments are on the right track....

Are there any other details about the algorithm or how it works. I'm curious about what exactly the different weightings mean.

laktek · on Jan 12, 2010

Same story here..Is pg is added to all users by default?

alanthonyc · on Jan 12, 2010

Cool. Now we can play six degrees of hn.

jcl · on Jan 12, 2010

How is "word choice" similarity calculated? If you have a high similarity with someone, does that mean your range of words is the same (perhaps because you write on the same topics) or that your word frequency is the same (because you have similar patterns of speech)?

Are quoted sections filtered out? URLs?

andrewljohnson · on Jan 12, 2010

My girlfriend and I have almost the exact same lists... smokey_the_bear and andrewljohnson.

10ren · on Jan 12, 2010

So, some great users are on your similarity list... but are you on their similarity list?

trunnell · on Jan 12, 2010

For exactly zero of the twelve users listed, no.

rinich · on Jan 11, 2010

I approve of all the people I've been grouped with, except that bizarre swombat fellow.

adrianwaj · on Jan 12, 2010

How does colins_pride compare with riffer? Aren't they the same person?!?

riffer · on Jan 12, 2010

This is a cool question. Two things: the first is that I stopped using the colins_pride account and subsequently started using the riffer account, so the co-commenting commonality is not particularly high. The terminology score is substantially closer. The other thing is that because the dataset is tilted towards the recent, the colins_pride user is only lightly represented.

tlrobinson · on Jan 12, 2010

Interesting both of my co-founders (tolmasky and boucher) showed up on most of my lists. I guess that means it works, since we tend to talk about similar things.

Vivtek · on Jan 12, 2010

Apparently tumult is most similar to me no matter what I select, so clearly I can just stop posting entirely. (Thanks, tumult - more time in the day!)

MikeTLive · on Jan 12, 2010

websense doesnt like you.

Security risk blocked for your protection Reason: This Websense category is filtered: Potentially Damaging Content. URL: http://www.swimwithoutgettingwet.com/hnusers/

rdtsc · on Jan 12, 2010

Interesting, similar to:

    edw519

    btilly

    patio11

    pg

    tptacek

mgrouchy · on Jan 12, 2010

hrm, I seem to have gotten tptacek as my top result, if the results are so ordered.

Id say thats a good thing. In general the whole list is people I would happen to be even somewhat similar to.

auston · on Jan 12, 2010

http://www.swimwithoutgettingwet.com/hnusers/?user=auston...

tibbon · on Jan 12, 2010

It didn't return any results when I tried from the iPhone

Sukotto · on Jan 12, 2010

I don't show. Guess I'm not "one of us" yet :-/

PebblesRox · on Jan 15, 2010

Did you capitalize your name? It seems to be case-sensitive... http://www.swimwithoutgettingwet.com/hnusers/?user=Sukotto&#...

peregrine · on Jan 12, 2010

I got a great list. Neat.

pclark · on Jan 12, 2010

got swombat. lame. :)

whalesalad · on Jan 12, 2010

pg was number one for me :D