Grooveshark didn't use any of that. We were very careful about avoiding dependencies where possible and keeping our backend code clean and performant. We supported about 45M MAU at our biggest, with only a handful of physical servers. I'm not aware of any blog posts we made detailing any of this, though. And if you're not familiar with the saga, Grooveshark went under for legal, not technical reasons. The backend API was powered by nginx, PHP, MySQL, memcache, with a realtime messaging server built in Go. We used Redis and Mongodb for some niche things, had serious issues with both which is understandable because they were both immature at the time, but Mongodb's data loss problems were bad enough that I would still not use them today.
That said, I'm using Docker for my current side project. Even if it never runs at scale, I just don't want to have to muck around with system administration, not to mention how nice it is to have dev and prod be identical.
> That said, I'm using Docker for my current side project. Even if it never runs at scale, I just don't want to have to muck around with system administration, not to mention how nice it is to have dev and prod be identical.
This is why I use docker, at work and for my own stuff. No longer having to give a shit whether the hosting server is LTS or latest-release is wonderful. I barely even have to care which distro it is. Much faster and easier than doing something similar with scripted-configuration VMs, plus the hit to performance is much lower.
My feelings as well! The way it put your queue front and center and gave you so much control over how things were added worked really well for me. Spotify leaves a lot to be desired in its UI.
I still remember opening the site one day and reading the weird appology letter. I think that was the first time I saw something I really cared about just dissappear off of the internet.
Man I miss Grooveshark still today. Spotify is okay but still a step down. Needing billion-dollar licensing schemes to even get started makes this such a hard market to actually get into and provide a competitively superior experience.
What a great service. I'd be curious if you could go into details how the radio feature worked back then, because I found myself receiving worse suggestions when I used similar features in Spotify/Google Play Music.
Oh man, I should write a blog post about that, as I built that feature myself. It was meant to be a stopgap until we could get some real matchine learning in there, but nothing else we tried did as well.
First, for efficiency all recommendations were artist to artist,nl not song to song. That works well for a lot of genres but is pretty bad for others.
We started with a free DB of artist similarities, I don't remember where we got that from, maybe musicbrainz? We built a shitty internal interface for adding and removing links between artists and adjusting the weights of those links and then made it available to all employees to mess with. As you might imagine just about everyone there was passionate about music so it didn't take long to crowdsource a huge catalog of quality recommendations and then for really obscure stuff we would fall back to the open db.
So the actual algorithm would look at your seeds - artists you put in the queue before turning on radio or artists with songs that you liked while radio was on, pull the top n linked artists for each of your seed artists, and do some weighted shuffling. It would also make sure to space out artists so you don't hear the same one too often etc.
Then for genre radio we just secretly selected a bunch of artists we felt were representative of the genre and used those as the seeds.
Oh yeah and if you disliked a song we'd prevent that artist from playing for the rest of your session.
We also would look at anomalies like popular artists with not many recommendations, or artists that, when used as seeds, lead to shorter listening sessions (implying that the recommendations need to be cleaned up).
Most attempts to replace this with something smarter ran into 2 problems: 1. Popular stuff is popular, so it looks like a good recommendation for anything, and 2. ML is hard and takes a lot of time, which we never had enough of
I want to salute you for the Grooveshark recommendation engine. To this day, that's THE feature that I used a LOT on Grooveshark (hours & hours), and that I'm frustrated about in Spotify.
You did an amazing job on this one.
I'm not aware of an MB similar artist database. I'm guessing you used music map[0], it's the only free database I know of for similar artists that doesn't require scraping.
Ah, I think you're right that it wasn't MB because I remember having to match artists by name rather than mbid. Music map doesn't sound familiar but my boss negotiated the access, I think I was just ingesting the data from a csv dump so it could have been anywhere.
That said, I'm using Docker for my current side project. Even if it never runs at scale, I just don't want to have to muck around with system administration, not to mention how nice it is to have dev and prod be identical.