Hacker News new | past | comments | ask | show | jobs | submit login

So.. I think it already has been happening ( people attempting to poison some sources for a variety of reasons ). I was doing a mini fun project on HN aliases ( attempting to derive/guess their user's age based on nothing but that alias ) and I came across some number of profiles that have bios clearly intended to mess with bots one way or another. Some have fun instructions. Some have contradictory information. Some are the length of a small night story. I am not judging. I just find it interesting. Has vibes of a certain book about a rainbow.





Tell me about that side project. How does that work? What does it say about me? I find that very interesting.

The idea itself is kinda simple, but kinda hard, because it relies on how the language we use, gives us away.

For example, references we put ( simpsons, star trek, you name it ), language we use ( gee whiz, yeet, gyatt) and that is used to generate an online persona tends to be something of note to our image of self - one can determine to some extent the likely generation from those

The reference itself may not automatically mean much, but it is likely that if it is present in an alias, it had an impact on a younger person ( how many of the new generation jump on an old show? so mr robot would have the exposure range of 2015 to 2019 ). If that hypothesis is true, then one can attempt to guess age if the individual given that work work, because 1) we know what year is now 2) we know when it was made, which allows for some minor inference there.

Naturally, some aliases are more elaborate than others. Some are written backwards and/or reference a popular show or popular sci-fi author. Some are anagrams ( and - I discovered today - require additional datasets to tag properly so that is another thing I will need to dig up from somewhere ). And to complicate things further, some aliases use references that are ambiguous and/or belong in more than one category ( Tesla being one of them ).

The original approach was to just throw everything into LLM and see what it comes up with, but the results were somewhat uneven so I decided to start from scratch and do normal analysis ( language, references, how digits are used and so on - it is still amazing how well that one seems to work ).

Sadly, it is still a work in progress ( I was hoping for a quick project, but I am kinda getting into it ) and I probably won't touch until next weekend since the coming week promises to be challenging.

Unfortunately, this means in your particular alias ended up as:

Alias category is_random length is_anagram generic_signal Loughla Mixed Case 0 7 FALSE FALSE

( remaining fields were empty, basically couldn't put a finger on you:D). If you can provide me with an approximate age, it would help with my testing though:D

edit: This being HN. Vast majority of references are technology related.


That is very cool…and your alias is hard for me to decipher

I have a separate - not fully implemented - section for more semi-random aliases, but it revolves around our tendency to use default settings and commonly used tools for generating them. Thus far the only thing I was able to show with it is that it is not uncommon, but no clear proxy for age.. so seems like a dead end.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: