genuine question: how is this different than the official creator marketplace from TikTok?
https://creatormarketplace.tiktok.com/
I'm in the influencer space. Why would I pay you when I get the same thing for free from Tiktok?
We’re looking for a Backend Engineer with experience in PHP (particularly in Laravel). We have an office in Mumbai, and we are primarily looking to expand this team with an option to work remotely.
Touchstorm is YouTube agency that is building out a YouTube platform for creators and brands. We are on a mission to organize YouTube data. You apply for this job here: https://www.touchstorm.com/sr-web-engineer/
I would like to do the same with my country (malaysia). Even video amigo doesn't have the stats for mine. How would one do this? That's the interesting bit.
No,i am not an artist. Just for fun, to have an idea, how does; in this situation (music, be it unsigned artist or artist) youtube views coincide with the local mainstream (radio) plays.
Is there a video of a real live example of how spark helped to solve a specific problem? I've tried quite a few times to get my head wrapped around what Spark helps you solve.
In theory, Spark lets you seamlessly write parallel computations without sacrificing expressivity. You perform collections-oriented operations (e.g. flatMap, groupBy) and the computation gets magically distributed across a cluster (alongside all necessary data movement and failure recovery).
In practice, Spark seems to perform reasonably well on smaller in-memory datasets and on some larger benchmarks under the control of Databricks. My experience has been pretty rough for legitimately large datasets (can't fit in RAM across a cluster) -- mysterious failures abound (often related to serialization, fat in-memory representations, and the JVM heap).
The project has been slowly moving toward an improved architecture for working with larger datasets (see Tungsten and DataFrames), so hopefully this new release will actually deliver on the promise of Spark's simple API.
* distributed machine learning tasks using their built-in algorithms (although note that some of them, e.g. LDA, just fall over with not-even-that-big datasets)
* as a general fabric for doing parallel processing, like crunching terabytes of JSON logs into Parquet files, doing random transformations of the Common Crawl
As a developer, it's really convenient to spin up ~200 cores on AWS spot instances for ~$2/hr and get fast feedback as I iterate on an idea.
It originally billed itself as a replacement for Hadoop and MapReduce as an in-memory data processing pipeline. It is typical in MR programs to create many sequential MR jobs and save the output between successive jobs to HDFS. So Spark can solve these use cases. Since its early days, it has built on its capabilities.
So real world use-cases? Any MR use case should be doable by Spark. There are plenty of companies using Spark to create analytics from streams, some are using it for its ML capabilities (sentiment analysis, recommendation engines, linear models, etc.).
I apologize if my comment isn't as specific as you're looking for, but I know of people who use it for exactly the scenarios I've outlined above. We are probably going to use it as well, but I don't have a use case to share just yet (at least nothing concrete at the moment). Hopefully this gives you some idea of where Spark fits.
I think your question is oriented towards X being a business problem.
Netflix has users (say 100M) who have been liking some movies (say 100k). Say The question is: for every user, find movies he/she would like but have not seen yet.
The dataset in question is large, and you have to answer this question with data regarding every user-movie pair (that would be 1e13 pairs). A problem of this size needs to be distributed across a cluster.
Spark lets you express computations across this cluster, letting you explore the problem. Spark also provides you with a quite rich Machine Learning toolset [1]. Among which is ALS-WR [2], which was developped specifically for a competition organised by Netflix and got great results [3].
We use Spark essentially as a distributed programming framework for data processing - anything you can do on a small dataset on a single server, you can do the same thing on a huge dataset and 20 servers or 2000 servers with minimal extra development
I had hundreds of gigabytes of JSON logs with many variations in the schema and a lot of noise that had to be cleaned. There were also some joins and filtering that had to be done between each datapoint and an external dataset.
The data does not fit in memory, so you would need to write some special-purpose code to parse this data, clean it, do the join, without making your app crash.
Spark makes this straightforward (especially with its DataFrame API): you just point to the folder where your files are (or an AWS/HDFS/... URI) and write a couple of lines to define the chain of operations you want to do and save the result in a file or just display it. Spark will then run these operations in parallel by splitting the data, processing it and then joining it back (simplifying).
I don't know about videos but I used spark in my last job to solve problems of "we want to run this linear algebra calculation on x00000 user profiles and have it not take forever". For me the big selling point is it lets you write code that can be read as ordinary Scala but which runs on a cluster. As much as anything else it's practical to get the statistician to review the code and say "yes, that is implementing the calculation I asked you to implement" in a way that wouldn't be practical with more "manual" approaches to running calculations in parallel on a cluster.
This is great, good job! One application of this interface is for companies to use it as a front-end style guide for their own site/web app. Just like people use bootstrap as the starting point for their front-end, and then customize fonts, size, color, etc. This could be a starting point for front-end/template user guides, which is why I was hoping I could simply fork this in github, customize it to our front-end rules and make it available internally so that all front-end coders use the exact same conventions. Thanks for making this available. Any plans to open-source the project?