With that said, I like the look of the UI you put on top, great work!
Edit: As others have pointed out, the author doesn't need to make their creation MIT as well. I misunderstood the license agreement. They just have to include the notice of the dependencies somewhere. TIL.
The MIT license isn't infectious! You might have just told the author to give up their rights.
They're well within their rights to release their software as MIT or whatever, but they should make that decision on their own or as reflection upon the proper arguments from the community.
Second sentence in the README file on github is "This app is a wrapper around The Sherlock Project" (last edited 5h before posting) and second sentence in the UI is "This project is a wrapper around the Sherlock Project."
I found myself scrolling through Github’s “trending” repos, looking for some coding inspiration. Within the next hour, I stumbled across something called The Sherlock Project. Interesting, It had over 35k stars, must be pretty popular.
I quickly cloned the repo and started toying around with it. It didn’t take me long to realize the power of this tool. All I had to do was insert a username, and voila! I was looking at every social media website that was associated with the username. Not only that but direct links to the accounts.
I immediately wanted to turn this into a web app so that everyone could use it. My first challenge was that this was a CLI tool, so I got to work. The Sherlock project makes about 400 requests to various site s to check if your username exists. This was going to be tough... I noticed they were using requests.FutureSession to multithread the result.
I decided to use a multithreaded Web-socket to continuously report out data to the frontend. After ALOT of trial and error I finally got something working. The Issue now though was that it wouldn't run in production due to a multiprocessing error: Daemonic processes are not allowed to have children.
Eventually I learned that you cant use the standard multiprocessing library for this kind of thing, you had to use billiard. Bam! It worked. I quickly hacked together a simple frontend, configured the web socket, and results were pouring in.
Turns out, the web-socket is considered a "long running request" as it makes 400 external requests. Maybe I could use celery to offload this process to a worker and queue it up. I started working on it and realized this was a little out of my skill range.
I then decided to take a look at the logs where I hosted the code and what do i find? CPU, Memory, and bandwidth all reaching a staggering 100% usage. I was using the free tier of Render that only allowed for one instance of my app...duh. I did some rework of my codebase and it started running a little faster.
Needless to say, I learned to take it slow, build tests for my code, and be patient with results.
What do you guys think? Any hard lessons learned in coding? What were your takeaways?
You need a privacy policy (or at least a one-liner statement) that gives potential users some assurance you aren't harvesting their username / IP / etc or the results for some other purpose or piping it to advertisers.
Yep. The wrap around the tool is neat and looks well done based on what I could see, but I hesitated based just that consideration.
edit: I thought I should make my feedback less generic. In this case, by neat I mean: no fluff, no useless stuff on the landing page, straight to the point. I appreciate that.
If they were going to harvest those usernames by posting it on hackernews, wouldn't it be easier to just scrape hackernews for usernames in the first place?
I echo this request. It was my expectation actually!
I searched for my username and was shocked it was used on literally every website you checked. Then I tried a less common variant and was similarly shocked, until I realized that you were showing me fewer websites the second time around. Only then did I realize you only showed me sites where it was already claimed…
what I find annoying is that (other than HN) most places won't let you claim your username if the person has signed up, yet never posted over years (or even logged in) and places where an account was deleted.
when I signed up to HN, I had a different username. I reached out to admins and they looked at the other account that had been created but they let me have it because the person hadn't logged in since. Oh! I think GitHub did as well. (shit, I wonder if I have mixed up GitHub and HN... I'm pretty sure they're the two that did actually let me have my handle... :x)
Because some people sign up for a site early and then never use it. I can't get my desired handle on a particular social network because some dude registered 15 years ago, posted one item, and then never used it again.
I mean, obviously I'll live but it's a silly situation.
I really like how Discord handles it.
Username + 4 digit number = unique handle, but if nobody else has the same username in a server then you can just use the username to refer to them
it's not the same thing at all but also kind of the same thing, although it's actually, at the same time, kinda also not. alright, i'll get real. the most important thing regarding the linked project is the fact that it's got graphs. with lines. in various colors. lines that generally rise upwards, towards the right-hand side of your screen(s). lines that, more often than not, have slopes which vary in intensity and length. lines that are part of, if i may take this chance to kindly reiterate, graphs.
everybody loves graphs, right? i know i do. almost as much as i love search results linking to 37 minute youtube how-to videos for reminders/instructions on how to fix a 37 second problem with absolutely zero transcript in the video description's text area.
Some people are liking the UI here, but it wasn't clear to me at first that a [+] meant the username was used on that site. In my mind, it could just as easily mean that it's available on that site (that could be a positive result if you're looking for places to sign up with a name). This should be made more clear.
It would be interesting if you could toggle on the not-found list as part of the results. If you get a big positive list, but you'd like to find the sites where the username is not in use yet, there's no quick way to get that info. (Yes you could scrape both lists and use some simple command line scripts to get those results, but it's such a simple thing to add to this tool)
good reminder for why you shouldn't use the same username everywhere unless you intend to have an online presence
Personally I have a specific scheme I follow w my more private usernames. Basically the same username but depending on the website it has a predictable alteration to it so I get different usernames for different sites but don't have to remember them all
Then I have a different username (well, a couple) I use for sites where I don't mind having a public presence that can be tied back together
Yeah I tried "dvmnasrtjkhqofjsenvign" and got false positives on Dribbble, Enjin, HackerNews, Instagram, Quizlet, Smule, and livelib.
Also got the same false positives with a couple aliases of mine that seem to be actually unique so far. I was a bit annoyed someone had already stole them on HN for a second.
I tried this with my usual username (which is unusual; not many Griscoms out there), and found some false positives.
The following sites gave me errors with my username, and if I changed the username to something unlikely I got the same error. I take this to mean that there's no evidence that username exists on that system. (And, I'm dubious that I'd have signed up at these sites.)
* quizlet.com
* www.enjin.com
* apps.runescape.com
* smule.com
* livelib.ru
And, fiverr.com's URL for my username just bounces to the homepage, as did any username. (Again, I wouldn't have signed up there.)
I don't want to rain on anyone's parade, but I'll just add my unsolicited opinion...
Functionality like this, and more acutely, any kind of cross-service, cross-account stylometrics correlation or de-anonymizing service gives me great concerns for its abuse and the groups that it likely endangers (and the groups it likely empowers).
Well, not sure I am a bad actor but reading the comment gave me the idea of typing my ex's nickname and I now discover she has an account on a quite unusual platform of which I am a member, too.
For sure, I would never have the time to build such a tool. So, yes, some bad actors may have the "ability" as you mention, but I don't think that is the best justification to make querying this tool that easy.
I am using an un-pronouncible combination of 4 characters in some games. Never could not get it anywhere I wanted it. Was surprised to see it found on 60 sites, none of them me.
My username here is found on only 50, the other 49 not me.
It might be a useful tool to pre-check names before creating accounts for someone who wants a consistent name everywhere.
Why use React for something this light? People wants to use a frontend frameworks for everything now, I miss 1998. Just kidding, great work! Way more accessibly to the average person than the Sherlock CLI.
Hmmm... I searched a name and it found 9 results. But none of the profiles actually existed when I followed the links.
Mind you, I've never fond these kind of things to be very effective. Bellingcat has a selection of similar tools and the results are always pretty unreliable:
.. just ran the iOS MAIGRET tool report [massively granular-thus more false positives + a shit tonne of server errors] against your tool. Enlightening.
https://old.reddit.com/r/Python/comments/z1ts1n/i_built_an_a...
S̵h̵e̵r̵l̵o̵c̵k̵ ̵u̵s̵e̵s̵ ̵M̵I̵T̵ ̵l̵i̵c̵e̵n̵s̵e̵ ̵a̵n̵d̵ ̵y̵o̵u̵ ̵s̵t̵i̵l̵l̵ ̵h̵a̵v̵e̵n̵'̵t̵ ̵a̵d̵d̵e̵d̵ ̵M̵I̵T̵ ̵l̵i̵c̵e̵n̵s̵e̵ ̵t̵o̵ ̵y̵o̵u̵r̵ ̵r̵e̵p̵o̵.̵
https://github.com/sherlock-project/sherlock
https://github.com/bnkc/handlefinder
With that said, I like the look of the UI you put on top, great work!
Edit: As others have pointed out, the author doesn't need to make their creation MIT as well. I misunderstood the license agreement. They just have to include the notice of the dependencies somewhere. TIL.