Hacker News new | past | comments | ask | show | jobs | submit login
Similar Hacker News Users (swimwithoutgettingwet.com)
175 points by riffer on Jan 11, 2010 | hide | past | favorite | 107 comments



I love it! My results were very encouraging:

  S W G W                      einstein

                               newton
  +------------------------+ 
  | edw519                 |   liebniz
  +------------------------+ 
                               turing
  ===========||============= 
                               carnegie
  Co-Commenting..SEMANTICS.. 
  Word Choice                  tesla

                               godel
  =====||===================
                               escher
  Leaderboard.....KARMA.....
  Diamonds in the Rough.....   bach

                               edison
  This site has no
  affiliation with Hacker      galois
  News or Y Combinator
	                       patio11


I think your leaderboard dial was set to 11.


Yeah - he should just make 10 louder.


I don't understand why ... his goes to 11


For $2,000 I'll build you one that goes to 12.


People asked for a similar users tool back in the early days of HN:

http://news.ycombinator.com/item?id=701

And recently, too:

http://news.ycombinator.com/item?id=1036247

How does this particular tool work?  It's based on the threads a given person comments on, who else comments on those threads, and how the topics and terminology of threads and comments relate to each other.  Karma on the relevant threads is used as a subtle authority metric.  Incorporating voting relationship histories would almost certainly make the tool better. Particularly at finding interesting (and not merely similar) stuff.

That said, it'll be interesting to hear what folks think.


I commend you on an elegant UI. Would you mind explaining a bit how exactly you anticipate using voting relationship histories in future revisions? Also, am I correct in assuming that there is some sort of distance function involved in computing 'similarity'?


I recently changed my user name from antiismist to idoh. When I moved the karma setting to Diamonds in the Rough, it listed idoh as the #2 most similar user to antiismist. Nice!

http://www.swimwithoutgettingwet.com/hnusers/?user=antiismis...


This is awesome. Having computed a reasonable distance function between users, you should be able to use this distance function as edge weights in a big graph. Rendering this graph with a force-directed layout algorithm like Fruchtermann-Reingold might create visually appealing results by clustering related users.

I've done this before on different datasets and would love to cooperate with you on it...


excellent choice setting the defaults to give very flattering results... there's a lesson in that.


I could probably turn this into a blog post (and will later), but there are a lot of microoptimizations of default settings for B2C apps that make huge differences in conversion:

- Defaults should be almost sufficient to use the app. (i.e. if it has some workflow, you should be able to pretty much hit "Next next next" and get it to work. Bonus points for making the number of nexts as low as possible.)

- Pick something that results in visually impressive output rather than a blank page. See Balsamiq for inspiration here -- they start you with a mockup in progress that demonstrates most of the highlights of the software.

- Ever seen Firefly? I really like how they use the word "shiny". Ideally, your defaults should show the shiny in your app. In Hollywood they have a saying: make sure your budget makes it onto the screen. In B2C apps, make sure the stuff you did all the work on makes it into the user experience most of your users will see.

- Assume your user is a novice at both your software and the problem domain until you have evidence otherwise. A lot of people ask the user "Hey, are you a novice?" That is one way to do things, but it makes your core workflow one stage longer and every stage costs you conversion. I prefer "Assume they are and give them a discrete 'skip ahead' button" or "Assume they are and watch them for evidence that they are not".

- If your app is supposed to make the user feel like they just killed an effing lion, then your default settings better have a lion bound and gagged sitting under a forty-ton weight suspended by a weak string which passes through an open pair of scissors next to a sign saying "Snip this."

- (Do this if nothing else.) Track actual usage of the app and modify your defaults based on actual usage. Bonus points if you can do it dynamically, if that makes sense for your app. For example, if you pick A as the default and 25% of your users go out of their way to change it to B, then that probably should have been B. (You can split test and see how many people would have changed it to A if it had defaulted to B.)


While all of these are true, I find that they usually derive from a simpler rule: Have a laser focus on the users you're trying to reach, and everything you do should make their lives easier in some way.

When I find people breaking these sorts of rules, it's usually because they're thinking of themselves or some non-customer stakeholder.

edited to add a corollary: until you've done the sort of testing patio11 advocates, you don't know who that customer is.


I think what you are saying is good advice, but don't think it is necessarily co-extensive with what I'm suggesting.

For example, say you're targeting teachers. Keeping a laser focus on what teachers want is important. However, I think you need to put extra focus on making their first five minutes absolutely amazing. (And their first 30 seconds. And their first 5 seconds.) My reason for this is simple: almost all apps are going to leak an amazing number of their customers between first and second use. I don't have my report in front of me at the moment, but I think something like 40% of BCC users never complete their first bingo card and never log in again. Essentially none of these people buy the software. On the other hand, roughly 2.4% of trialers convert, or roughly 4% of users who succeed in their first interaction with the app. Increasing my bottom line by 5% requires converting 5% more of that second group -- which, let me tell you, is hard freaking work -- or, in the alternative, improving the first run experience of three out of every 40 users who fail to complete Task #1.

It is really hard to optimize your entire application, user experience, value proposition, etc to get +5% conversion. However, polishing your first five minutes until it freaking shines is not nearly as difficult. Do you present people with a blank page currently? Spend an hour and put something on it. I will put money on that helping. Does it take a critical mass of friends/input/lions slain to get fun? Do the work for them. Fake it if necessary.

And, yeah, instrument everything. Can I plug Mixpanel here? plug Every time I think "You know I should really build more instrumentation into my app..." I remember "Oh, wait, it takes a twentieth of the time to just throw it on Mixpanel -- no visualization code or complicated controls to refine the data range required, praise be."


What does instrument mean?


Interestingly, there seems to be a local minimum (or maximum) in your algorithm that I found when searching for myself:

Start here: http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...

The next two 'clicks' to the left (towards co-commenting) don't have 'spolsky' in my list: http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...

But one more click to the left, and he reappears in the list: http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...


This is interesting, thanks for pointing this out. The top result is actually getting filtered out from the displayed results (don't ask). spolsky is going from being the second result, to the first result, to the second result in your examples.


You keep mentioning filtering… are you sure we can't ask? I'm curious.


Quite flattering - no matter what I set the sliders to, the names I recognize are people I respect.

I think that only really says something about HN, not about my comments :)


I had the same result. Perhaps this is not measuring similarity after all. :)


Aww, I was hoping to get amichail.


I think I was filtered out.


You're correct. I feel terrible. There is no question that you're the posterchild for controversial-interesting. I will fix this.


LOL! There was an actual amichail filter?


I just realized I have paid no attention at all to anyone's name. I know my brother's name when I see it, but that's it. Now I'm wondering if I am liked. Probably not >.<


Ditto. I never pay attention to the name either. Usually it takes 30 levels of nesting before I realize that I'm conversing with the same person.

The only name I recognize is pg, and since I save my username and password, sometimes I forget my own username too.


Your username is chrischen, FYI…


I actually typed "chris" as my username in the form the first time, wondered why no results showed up, and then realized my username is actually chrischen.


Ah rms right there in my list... how did I go so wrong... :)


Does moving the karma slider to the right look for people with low karma, or does it ignore karma? I want the option to ignore karma.


Excellent question. Middle ignores karma in the quantitative ranking. Far to the right gets lower karma users. Users who consistently get almost all 1s and/or negative points are separately filtered out since they generally don't make for interesting recommendations. Users who sometimes have negative comment scores but also make lots of 2+ comments are included (these users are rare but are arguably the most interesting).


Based on what other people here are reporting, I wonder if there's a bias towards matches with a large number of comments (e.g., pg, patio11, edw519, etc.). Perhaps there's some normalization needed?


There's a bias towards people on the leaderboard, though one of the knobs adjusts it.


I was happy to see you in my list, silentbicyle :-)


:)


Based on the default settings, I'm up there with PG and Patio11. I like your algorithm ;)


Everybody here has somewhat similar interests to pg. Otherwise we wouldn't be here. ;)


I checked a few people, and it seems like patio11 is in all of their results. I wonder why?


A big part of it is that the default settings reward comment karma fairly heavily. Try moving the sliders around. For example, with the defaults, patio11 is the top result for your username, but at the karma slider midpoint he's out of the top 12.


I had the same reaction. A little flattering, but I think it has to do with brevity more than anything.

edit: just realized that by default it's only returning the rockstars of the site (e.g. people high up on the leaderboard).


I'm up there with patio11, but not pg. (I get nostradeamous)

Not sure what that means though.


Interestingly, nostrademons and pg are the top two for me, so I wonder what would produce one but not the other. I'd love to see if comments could be separated into [user1]-like comments and [user2]-like comments.


Yeah, one potential solution is lists of users.

Another route might be clustering ('technical', 'political', etc.)


Nostra who?


Me, presumably.


Obscure pun, it's meant to be flattering.

http://news.ycombinator.com/item?id=181868

:-)


Yeah, sorry about the spelling.


A few weeks ago, I tried searching through old threads to find where PG had an archive of old comments and server stats. Does anybody have link(s)? Thanks in advance. (This post is meta enough that it's probably as good a time as any.)

I'm guessing either such a corpus was used here, or it's based on a cache of recent comments.


I got quite a few people I respect (mixmax, edw519, mattmaroon) and one I consider a personal friend, dennykmiu.

Neat little utility, thanks.


hey thanks!


Am I missing something or does this not work for the majority of us who are casual commentators on here?


It probably needs more data points.


Yeah, there's really two issues here.

Most of the techniques for this sort of thing don't work that well for sparse datasets. So given a choice between showing bad results for users with relatively few comments, and not showing any results ... especially when users can search not just for themselves but also for folks they know and enjoy ... Also, scraping the full history of HN is not cool.


A while ago, pg posted an archive of HN comments, specifically so that nobody would have to scrape them. (It was coupled with comments on how the arc server was holding up, IIRC.) I haven't had any luck searching for it, though - there are too many discussions about the ethics of scraping comments, archive file formats, and the like.


I remember that too. It was, as you say, a while ago, meaning long time outdated. And I indeed recall it to be pg-posted, but instead all I can find is this non-pg release with all links broken: http://news.ycombinator.com/item?id=173045. I looked for in all links to tar and zip archives, could've missed something.


Yeah, they're 404. Thanks for looking!


See boredguy8's comment near the top for a list of five datasets:

http://news.ycombinator.com/item?id=271066

I'd check the Internet Archive for the files & Google for the filenames.


That was the post I was looking for. They're 404 now, but thanks for looking.


Interesting. After the obligatory vanity search I tried searching for people I know have different (more technical, less startup/strategy(marketing) taste than me. And it seems to work quite well. I have nothing in common with those guys :-)

For instance the top pick for tptacek is cperciva, which seems natural. Doesn't work the other way around though, so there's still some work to be done...


Thanks for the feedback, you bring up a very interesting point: reciprocity. If I am closer to you than anyone else, does it follow that you are closer to me than anybody else? I'm not sure the right answer to that is yes ...


Definitely not in the general case of points in n-dimensional Cartesian space. For instance, in the 1-d case, consider points at 0, 10, and 12. 10 is the closest point to 0, but not vice-versa.

Put another way, it seems as though the algorithm considers tptacek more distinct (from all other HN users) than cperciva.


I was wondering the same thing. I don't show up in the lists of any of the people that show up in my list.

I take it that means I am the cheese.


which is in tune with tptacek being no. 3 on the leaderboard and cperciva being no.34.


Hmm, small bug(?) When I typed my name in all lowercase it didn't come up. That might be by design though. Cool tool ;-)


That's a good point. Let me see what I can do. The catch is that HN is case sensitive as well.

works: http://news.ycombinator.com/user?id=SapphireSun

doesn't work: http://news.ycombinator.com/user?id=sapphiresun


Ha, yes, that confused me, too.


I like my company. I don't know if they feel the same way about me.

One newer participant whose posts make me think "I wish I had posted that" doesn't show up on my list of associated participants. But I show up on his. Maybe that is because of the karma setting in the default operation of the search. Interesting.


If you don't mind telling me the user you're thinking of, I can take a look as well.


Thanks for the positive response.

If three things were to get added to this, what should they be?


Could you provide a reverse search -- who's lists do I show up on?

How about displaying the values used to order the result set, so you could compare weightings across users? reply


- explanation about what it does and how

- username has period afterward, for self-linking back to site

- various ratios next to each user name in a table, maybe linking to searchyc ie http://searchyc.com/user/riffer


Least similar users


This is fun to consider. Are you thinking of people who have completely different interests? Or folks who are interested in similar things but disagree?


I'm really not sure. I was naively thinking whatever the opposite of the existing algorithm is.


These are the 12 'farthest' from you, rms:

osteele erikstarck zedshaw atarashi matthavener clemesha alrex021 cliff NateLawson jefffoster mrcharles fix3r

This was done with the karma effectively off (same as the karma slider in the middle). What do you think? Anybody there you think shouldn't be?


Thanks. Most of the comments of those users are about programming which I don't often discuss here. Erikstarck doesn't really comment in the programming discussions but his comment style is definitely different from mine.


1) If I understand correctly the first slider ranks greater similarity in commenting on the same threads the farther left you go and similar word choice the farther right you go. Why not make this two separate sliders?

2) For each of the matches could you show some of the data used to compute the matches... e.g., for semantic matches, show the top X (maybe five?) common words or phrases you matched on. For threads, show the parent or OP of the five most recent threads... something like that.

3) Really neat. I like this. It's quick too. How did you do it? Did you replicate the entire HN database? Third suggestion: post the source or just post an explanation of how it works.


... and 4: apply the same thing to twitter (!)


Yeah, that's one of things to consider. All of the logic for this was originally developed for another purpose, and it got applied to HN for fun. The human connection element makes it very cool, and so do the possiblities for applying it to so many different applications.


I think there is still a subtle bug in there somewhere.

When I set the sliders 'semantics' all the way to the right ('word choice') and leaderboard all the way to the left, then check I get this:

  - patio11
    - mahmud
    - nostrademons
    - tptacek

  - mahmud
    - edw519
    - swelljoe
    - davidw
Shouldn't the relationships be symmetrical, so 'edw519' would get 'mahmud' as the first match and 'mahmud' would get 'edw519' ?

edit: also, your 'match' is case sensitive, so 'riderofgiraffes' won't work but 'RiderOfGiraffes' does.


It filters on karma as well.

All of those guys showup in my default results, but I can't get any of them to pick me up. Clearly I don't post enough.


If that were the case then I would show up, and I don't either.

I've received some email from David (the guy that built it), he's going to fix this and the lowercase issue as soon as things quiet down a bit.


Awesome! I knew SwellJoe and I seemed to end up in the same threads, and IIRC, agree on things. Even though I'm a relative neophyte. I love it!


Very cool.

My matches using the default settings: pg, swombat, mahmud, wheels, SwellJoe, edw519, mattmaroon, gojomo, davidw, mixmax, unalone, tptacek

Did you get permission to scrape the data? (I tried once without asking with mediocre results: http://www.mattmazur.com/2008/08/the-wrong-way-to-get-notice...)


As someone in the comments on your pages also notes, you could use the google cache. But it would still be nicer to ask first.


Really fun app! Great for boosting one's ego.

Slight note about the page formatting: My screen resolution is 800x480, and the text by the sliders wraps in a very confusing manner.

It looks like this to me

  ===============||===================
  co-commenting......SEMANTICS.....word
  choice
A little confusing at first, until I realized that it was wrapping. It's the same for the other slider.

Great app though!


Very interesting. Depending on how I adjusted it I was judged similar to pg, unalone, and a couple other people on the leaderboard. I guess that means my comments are on the right track....

Are there any other details about the algorithm or how it works. I'm curious about what exactly the different weightings mean.


Same story here..Is pg is added to all users by default?


Cool. Now we can play six degrees of hn.


How is "word choice" similarity calculated? If you have a high similarity with someone, does that mean your range of words is the same (perhaps because you write on the same topics) or that your word frequency is the same (because you have similar patterns of speech)?

Are quoted sections filtered out? URLs?


My girlfriend and I have almost the exact same lists... smokey_the_bear and andrewljohnson.


So, some great users are on your similarity list... but are you on their similarity list?


For exactly zero of the twelve users listed, no.


I approve of all the people I've been grouped with, except that bizarre swombat fellow.


How does colins_pride compare with riffer? Aren't they the same person?!?


This is a cool question. Two things: the first is that I stopped using the colins_pride account and subsequently started using the riffer account, so the co-commenting commonality is not particularly high. The terminology score is substantially closer. The other thing is that because the dataset is tilted towards the recent, the colins_pride user is only lightly represented.


Interesting both of my co-founders (tolmasky and boucher) showed up on most of my lists. I guess that means it works, since we tend to talk about similar things.


Apparently tumult is most similar to me no matter what I select, so clearly I can just stop posting entirely. (Thanks, tumult - more time in the day!)


websense doesnt like you.

Security risk blocked for your protection Reason: This Websense category is filtered: Potentially Damaging Content. URL: http://www.swimwithoutgettingwet.com/hnusers/


Interesting, similar to:

    edw519

    btilly

    patio11

    pg

    tptacek


hrm, I seem to have gotten tptacek as my top result, if the results are so ordered.

Id say thats a good thing. In general the whole list is people I would happen to be even somewhat similar to.



It didn't return any results when I tried from the iPhone


I don't show. Guess I'm not "one of us" yet :-/


Did you capitalize your name? It seems to be case-sensitive... http://www.swimwithoutgettingwet.com/hnusers/?user=Sukotto&#...


I got a great list. Neat.


got swombat. lame. :)


pg was number one for me :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: