> It's awesome that the dataset is offered with a CC-0 license: https://voice.mozilla.org/en/data, does anyone know if it includes the answers from the survey?
I'm downloading it now, I'll have an answer in a half hour. Does anyone know if there is a torrent for it?
filename,text,up_votes,down_votes,age,gender,accent,duration
cv-valid-test/sample-001224.mp3,but i felt miserable watching him wither away like a shriveled dandelion,1,0,thirties,male,england,
Not sure how some of these are being populated, but yeah; there's several additional folders including invalid mp3, a splintered train set (not sure how it was selected) and a test set folder.
Here's the README.txt. Looks cool! Have happy hacky fun! :)
Thanks! Couldn't find the source of the Readme in the zipfile. Can you talk about what the update process for this file is? How often is it updated? Is there a way to just download the new files? Is there a tarball script for this in the repo somewhere?
I see that you have instructions for s3, are the files actually backed in s3? Is it possible to download them with s3 (possibly using requester pays)?
We have no plans to allow users to download the "raw" data from s3 (ie. before we perform the train/dev/test split). But we want to eventually build some tools to automate this. See here for some background:
I'm downloading it now, I'll have an answer in a half hour. Does anyone know if there is a torrent for it?