Hacker News new | past | comments | ask | show | jobs | submit login
Using Spotify to measure the popularity of older music (poly-graph.co)
88 points by bdr on Aug 24, 2015 | hide | past | favorite | 51 comments



What strikes me are the missing artists. For instance, no Beatles. I'm pretty sure I hear the Beatles on TV, radio, etc more than all other 60s music combined. Ditto AC/DC: 1980 "Back in black" is the best selling rock album ever, and I hear AC/DC really, really often in the background. I'm afraid Spotify isn't _that_ representative of all available (pop) music.


Keep in mind: Some artist did not consent to streaming, like the Beatles. Aren't they Apple-exclusive?


Yea, Beatles, T Swift, Ac/dc are all not on Spotify. I preface that in the article.


I'm wondering how much this is skewed by playlists -- i.e. a popular song's stats are amplified by its inclusion on popular playlists and Spotify radio. It'd be fun to see if these stats would change if you included only songs that people actively seeked out.


Yea, the plan was to remove counts via radio and playlists, but it would have doubled the amount of work from a data call/cleaning perspective. I'd bet that the results are still directionally accurate :)


how did you "call" the data?


Chet Faker released a cover of No Diggity a few years ago, which is responsible for all of my plays of the original on Spotify.

A popular cover can probably drive a lot of popularity for an older song on Spotify.


Agree! In fact, I'd argue that a v2 of this project, where I look at the features of songs with high longevity (e.g., No Diggity) were driven by samples/covers/Glee/etc.


Yea, it'd even be interesting to compare a song's play count over time to it being featured in popular movie soundtracks, etc.


Was thinking the same thing, but also for a current cover to catch on there is likely something about the original that still resonates.

One nitpick: "Iris" by Goo Goo Dolls was ineligible for the Billboard Hot 100 for a long time due to their rules that songs had to be sold as a "single" to be counted on that chart. It was #1 in terms of radio airplay for a huge chunk of 1998, so it's not really accurate to say it didn't chart highly at the time.


Ah. Didn't realize that. Billboard data, generally, is terribly inaccurate in terms of a measure of cultural popularity...back in the day, they used to call radio stations to ask what their top 10 were. But it's the best we've got in terms of data.


Chet Faker released a cover of No Diggity a few years ago

This doesn't add a single thing to the discussion, but it's a great cover too.


If you liked this, you may be interested in Dorothy Gambrell's analysis from the other direction. She took the top songs from billboard over the past 100 years, and tried to find out which ones have remainied popular.

http://www.verysmallarray.com/?p=1752


Yup. I emailed Dorothy at the beginning of the project to get the Whitburn database.


author here, if there's any questions


Thanks for this: http://poly-graph.co/vocabulary.html I've gotten hours of conversation out of it.

Also so happy to see that "You Got Me" by The Roots got 6 million streams in 2014. That is the definition of a future timeless track.


Has there been any kind of analysis about potential biases? From the top of my head I can think of the fact that Spotify was not available in every country from the start. Also, not every person on the world streams their music. Does this data include record sales for examlpe? Does it normalise the data somehow taking into account the percentage of the population that streams music versus other means of consuming music?

Or is it just plain and simple data gathered from Spotify?

I'm not saying that there is anything wrong/bad about the results. But without knowing the details on how the data is collected, it's hard to read anything from the results.


This is awesome. You kinda touch on this, but the main thing that jumps out at me is that there seems to be a very strong bias towards songs from the late 90s and early 00s. Do you think this is because Spotify’s main cohort is people in their late 20s to early 30s, who came of age, musically, during that time?


Honestly, I expect recency to be a large factor in Spotify plays. That is, we expect that the average hit from the 70s to have fewer plays than a hit from the 80s, and so on.

Regardless of the age of the listener, I expect newly released songs to have higher playcounts (in fact, I plotted this curve, but it was too high-brow for the Internet and the audience for which I was writing).

That said, if I managed to cut the data by age-bucket, I do think that the results would shift toward the music with which you grew up.


I'm surprised that nirvana's smell like teen spirit wasn't a bigger hit. However, Nirvana was insanely big at the time, so they don't really fit the hypothesis of underground sleeper hits.

I'm sure Nirvana has many #1 hits, but why did Smell like teen spirit become the poster song for Nirvana?

The same can be said about Oasis, who at the time was insanely big and held several spots on top 10 lists for months. But maybe they were bigger in Europe than the U.S. And maybe European fans are driving the spotify listens?


It's too bad that pearl jam isn't represented. I listened to some the other day and was taken with how good it was.

But, very interesting data nonetheless! I'm loving it!

Is it possible to filter the data by listener age? I wonder, because a lot of these songs are in play lists of mine from the 80s / 90s (I grew up in the 80s / 90s). Maybe spotify's user base is older than suspected?

Also, it would be interesting to plot billboard rankings vs spotify rankings. Possible?


Yea, the plan was to cut the data by age buckets that Spotify has (<18, 19-25, etc.). This, unfortunately, would have quintupled the size of the data pull, which was already close to a million requests.

The last point, Billboard vs. Spotify, is in the second to last chart. Check it out :)


Pearl Jam is old music? Man, I feel old.


Why "timeless"? When I use that word to talk about music, it means that the music has evaded its contemporary trappings. Its opposite is "dated".

Why lasting popularity as a measure of timelessness?

How do you account for longer trends? Some of Bach's children were more popular than he was for quite a while.


I actually think that "longevity" is a better term. But I only realized that after I wrote the article. Here's another version of the article that never really was released, which focused on longevity over timeless: http://poly-graph.co/timeless/nodiggity.html

We only have two data points in this work: today and release date. So longer trends like the one you pointed out might be lost in time.


I am more interested in what you use to make these graphs and charts. The data presentation in all the articles on poly-graph is always excellent. The interactive charts and graphs is perfect for letting the reader understand the data. Did you make all the graphs and charts?


I made everything. Thanks!

I used D3 to create the charts, as well as some additional frameworks (Jquery, Waypoints).


I loved the way some of the charts reflected how far I had read in the content; very subtle and effective.


Great job.

Few slight bugs on the Present-day Popularity of Five Decades of Music, Dream On appears twice in the 70s with the same listen count(73 & 76). Also Blink-182, 1999 is showing in the 00s. All I Want For Christmas Is You — Mariah Carey, 2000 is showing in the 90s.


No question; But this is one of the coolest things I've seen on the internet yet.


When are you coming back to do another tech talk at Pivotal Labs?


This is impressive!

Did you manually retrieve the play count for each track or is there an automated way of doing it?


Also interested in how the data was gained.


Very slick analysis!

No Diggity is a great song, but the song it samples might be even better: Grandma's Hands by Bill Withers.


Crazy how much of the 90s rap sampled from the 70s and people don't realize it.


Were you thinking about creating spotify playlists from those top lists?


I should!


First of all I do dislike blog posts which lack a comment section to ask questions, criticize or praize.

Second of all I do dislike texts on data which lack information on where the data comes from.

I can think of ways to mine present day play counts from Spotify (while not working there) but I wonder where did he get the daily counts from he used in the last chart. Any ideas?

Furthermore I doubt that Spotify is necessarily a good indicator on how songs are being perceived in the long run. Especially b/c there are local platform-specific attractor dynamics at play.


Sorry that there's so comments section.

The data is pretty clear in terms of source...Spotify in 2014...Billboard data via Whitburn.

The data was directly from one of Spotify's data partners.

Yea Spotify isn't a perfect indicator. This is the best proxy for present-day popularity that I can think of. I could have create an index that abstracted several data sources, but that would have killed the readability of the article.


> Sorry that there's so comments section.

Just switch it on ... it's your site, isn't it?

> The data is pretty clear in terms of source...Spotify in 2014

That's not the "source" that's just a value of the time dimension.

> one of Spotify's data partners

well, you could have given that information in the text - if you talk about data, you gotta talk about where you got the data from.

Nonetheless the statement is still pretty obscure. Who is that "partner" - is it a secret?

Why don't you just dump the data on GitHub?

> I could have create an index that abstracted several data sources, but that would have killed the readability of the article.

I'm not sure if that is the true reason why you chose not to do it - but if so, then it is necessary to be transparent with assumptions, abstractions and simplifications, right?


Yea, there's lots of other charts, sources, notes that I could have included in the piece. The challenge is that this is not an academic study – it's Internet catnip. All the things that make studies too dense and boring to read (i.e., assumptions, abstractions and simplifications) are purposefully excluded.

I know that this undermines the credibility of the article, but I'm optimized for readability and storytelling, not to build a full-proof argument for timelessness. There's a million rabbit-holes that I could have gone down to make a much more solid case, but I decided to present the data and let the reader draw conclusions (kinda like I did with the hip hop/vocab piece: http://poly-graph.com/vocabulary.html).

I also realize that one could argue that this is a terrible way to approach a writing/data-analysis project. Assumptions and simplifications are important to highlight. But I weighed the options and decided to focus on accessibility.

Happy to discuss the pros/cons of this further :)


* in US pop culture


Why the downvotes ? It's actually true and really relevant to the article. Music was way more localized back then and a lot of mainstream US artists never actually reached a worldwide audience. (especially on the Rap section of the article).


It's all of Spotify. So while it's international data, it certainly skews toward western music.


While the author does refer to US music charts as a reference to past popularity, I don't see any indication that the Spotify data was US only.


While I don't doubt your mental narrative nor dismiss the value of evidence, if you think about the parent's assertion ("US and pop-culture bias") longer, it's obviously correct.


I agree with the parent poster that the article is US pop-culture centric. I just wanted to point out that the data isn't, necessarily, strictly US specific. I also felt that that the parent comment was a bit snarky given that this website has a large US audience, and US music and pop-culture are popular around the world.


You should say what you mean, then.

In which case, I should voice my opposition to the suggestion that we non-USians (eg. I'm an Australian in China) should communicate (even in our own language) with hat tipped to US popular culture because (inertia of Colosseum-fawning masses).

Here's a contrary view: I believe that intelligent people tend to respect and encourage diversity because it's both more interesting for them ("are we nought but latter-day curios for the coming AI overlord?") and because many fields of science (chiefly biology) show us strength in heterogeneity. The parent's comment was, I believe, offered in this spirit.


Isn't it ironic that Ironic is on there largely because people go listen to it because people tell them that none of the examples are actually ironic?


No.

The lyrics are a little fluff, but Ironic has a great chorus and catchy hooks. It's a classic pop song.


Isn't it ironic that people listen to a hit song about irony to feel superior about their own understanding of irony, which only increases its popularity and entrenches its expression of situational irony?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: