Hacker News new | past | comments | ask | show | jobs | submit | oonny's comments login

Nebraska Furniture Mart. It could be because I had a really rough time with Rooms-to-Go.


My First Million

The review below is pretty accurate. It ruined all other business/startup podcasts

https://twitter.com/ishverduzco/status/1557794282009817088?s...


genuine question: how is this different than the official creator marketplace from TikTok? https://creatormarketplace.tiktok.com/ I'm in the influencer space. Why would I pay you when I get the same thing for free from Tiktok?


I'm sketchy on details, but in the posting the author mentions that the creator marketplace is opt in, so there's (comparatively) few people on there.


Yeah I address that in the doc.

Basically many of my customers have the TikTok CM but it hasn’t got enough people on it. Obviously that could change


Touchstorm | Full-Time | Remote within India

We’re looking for a Backend Engineer with experience in PHP (particularly in Laravel). We have an office in Mumbai, and we are primarily looking to expand this team with an option to work remotely.

Touchstorm is YouTube agency that is building out a YouTube platform for creators and brands. We are on a mission to organize YouTube data. You apply for this job here: https://www.touchstorm.com/sr-web-engineer/


Paul Graham


VideoAmigo already does this for all countries (sourcing from YouTube) Top videos: https://www.videoamigo.com/music-charts/top-music-videos Top Channels: https://www.videoamigo.com/music-charts/top-music-channels Top Unsigned artists: https://www.videoamigo.com/music-charts/top-music-unsigned

You can sort each by country, genre, language.


I would like to do the same with my country (malaysia). Even video amigo doesn't have the stats for mine. How would one do this? That's the interesting bit.


yours as in ... you as an artist? these are the top channels in Malaysia: https://www.videoamigo.com/music-charts/top-music-channels?c...


No,i am not an artist. Just for fun, to have an idea, how does; in this situation (music, be it unsigned artist or artist) youtube views coincide with the local mainstream (radio) plays.


the country, genre, language is the reported country of the artist or channel. Not of the streams.


the only reason why I upgraded my phone to the 7s. I have a 3 year old. A smart phone + Google photos combination is a pretty good one.


You have a 7s?


ah meant 7 plus


I got the 7 plus and I can say that I am stunned by the photos I can capture of my son and daughter.


and they wonder why i use yahoo for my marketing spam newsletters.


Is there a video of a real live example of how spark helped to solve a specific problem? I've tried quite a few times to get my head wrapped around what Spark helps you solve.


In theory, Spark lets you seamlessly write parallel computations without sacrificing expressivity. You perform collections-oriented operations (e.g. flatMap, groupBy) and the computation gets magically distributed across a cluster (alongside all necessary data movement and failure recovery).

In practice, Spark seems to perform reasonably well on smaller in-memory datasets and on some larger benchmarks under the control of Databricks. My experience has been pretty rough for legitimately large datasets (can't fit in RAM across a cluster) -- mysterious failures abound (often related to serialization, fat in-memory representations, and the JVM heap).

The project has been slowly moving toward an improved architecture for working with larger datasets (see Tungsten and DataFrames), so hopefully this new release will actually deliver on the promise of Spark's simple API.


Thanks for the reply but I was looking for a usecase. e.g. with spark i was able to do X. I don't even know where Spark would be applied to.


We use it for two things:

* distributed machine learning tasks using their built-in algorithms (although note that some of them, e.g. LDA, just fall over with not-even-that-big datasets)

* as a general fabric for doing parallel processing, like crunching terabytes of JSON logs into Parquet files, doing random transformations of the Common Crawl

As a developer, it's really convenient to spin up ~200 cores on AWS spot instances for ~$2/hr and get fast feedback as I iterate on an idea.


It originally billed itself as a replacement for Hadoop and MapReduce as an in-memory data processing pipeline. It is typical in MR programs to create many sequential MR jobs and save the output between successive jobs to HDFS. So Spark can solve these use cases. Since its early days, it has built on its capabilities.

So real world use-cases? Any MR use case should be doable by Spark. There are plenty of companies using Spark to create analytics from streams, some are using it for its ML capabilities (sentiment analysis, recommendation engines, linear models, etc.).

I apologize if my comment isn't as specific as you're looking for, but I know of people who use it for exactly the scenarios I've outlined above. We are probably going to use it as well, but I don't have a use case to share just yet (at least nothing concrete at the moment). Hopefully this gives you some idea of where Spark fits.


I think your question is oriented towards X being a business problem.

Netflix has users (say 100M) who have been liking some movies (say 100k). Say The question is: for every user, find movies he/she would like but have not seen yet.

The dataset in question is large, and you have to answer this question with data regarding every user-movie pair (that would be 1e13 pairs). A problem of this size needs to be distributed across a cluster.

Spark lets you express computations across this cluster, letting you explore the problem. Spark also provides you with a quite rich Machine Learning toolset [1]. Among which is ALS-WR [2], which was developped specifically for a competition organised by Netflix and got great results [3].

[1] http://spark.apache.org/docs/latest/mllib-guide.html [2] http://spark.apache.org/docs/latest/mllib-collaborative-filt... [3] http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/...


We use Spark essentially as a distributed programming framework for data processing - anything you can do on a small dataset on a single server, you can do the same thing on a huge dataset and 20 servers or 2000 servers with minimal extra development


We primarily use it to aggregate a large-ish (10 TB/day) amount of data for insertion into an analytics database.

The code is very straightforward and it is fast.


Here's a (simple) problem I solved with Spark:

I had hundreds of gigabytes of JSON logs with many variations in the schema and a lot of noise that had to be cleaned. There were also some joins and filtering that had to be done between each datapoint and an external dataset.

The data does not fit in memory, so you would need to write some special-purpose code to parse this data, clean it, do the join, without making your app crash.

Spark makes this straightforward (especially with its DataFrame API): you just point to the folder where your files are (or an AWS/HDFS/... URI) and write a couple of lines to define the chain of operations you want to do and save the result in a file or just display it. Spark will then run these operations in parallel by splitting the data, processing it and then joining it back (simplifying).


I don't know about videos but I used spark in my last job to solve problems of "we want to run this linear algebra calculation on x00000 user profiles and have it not take forever". For me the big selling point is it lets you write code that can be read as ordinary Scala but which runs on a cluster. As much as anything else it's practical to get the statistician to review the code and say "yes, that is implementing the calculation I asked you to implement" in a way that wouldn't be practical with more "manual" approaches to running calculations in parallel on a cluster.


This is great, good job! One application of this interface is for companies to use it as a front-end style guide for their own site/web app. Just like people use bootstrap as the starting point for their front-end, and then customize fonts, size, color, etc. This could be a starting point for front-end/template user guides, which is why I was hoping I could simply fork this in github, customize it to our front-end rules and make it available internally so that all front-end coders use the exact same conventions. Thanks for making this available. Any plans to open-source the project?


I too came back here looking for the git repo, would be very helpful to be able to fork this for my custom bootstrap themes


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: