Hacker News new | past | comments | ask | show | jobs | submit | crazybit's comments login

MongoDB is horrible, I get it.

What do I use in this situation:

1) I need to store 100,000,000+ json files in a database

2) query the data in these json files

3) json files come from thousands upon thousands of different sources, each with their own drastically different "schema"

4) constantly adding more json files from constantly new sources

5) no time to figure out the schema prior to adding into the database

6) don't care if a json file is lost once in awhile

7) only 1 table, no relational tables needed

8) easy replication and sharding across servers sought after

9) don't actually require json, so long as data can be easily mapped from json to database format and back

10) can self host, no cloud only lock-in

Recommendations?


Elasticsearch? http://smnh.me/indexing-and-searching-arbitrary-json-data-us...

Depends on what your queries look like, I guess.


Just adding that I have used elasticsearch for a use case under the above constraints several times in the past and it worked well.

Ironically once because mongo was such a pain to work with I dumped the data from it into ES to get the better API, usability and Kibana.


I don't think it's that simple (being horrible). MongoDB can be great for some specific situations, perhaps yours. It's just that it's not for many others, and you'd need to be an expert to find this out from the docs.


Postgresql with 1 table with JSON fields?


Sounds like the return of Apple from years ago: in house everything, walled garden everything, including graphic stack, custom CPU, etc.

It almost ruined them (desperately holding on to their drastically interior CPU). What could go wrong this time? Especially when they no longer have Steve running the show.


has anyone actually made a solid case that Apple's current generation of silicon is "drastically inferior"?

Didn't Windows (on desktop) ignore OpenGL in favor of DirectX for years to great success, because it was deeply integrated into the platform?

Yes, Apple's ecosystem is the same walled garden it's always been. Any time it wasn't was an anomaly. The best fanboy response I can give is that they seem MUCH more willing to work with partners this time around.


Ancient Geeks knew about and drank milk. It's literally mentioned in Homer's The Odyssey, where Odysseus takes milk on one of his quests. Cheese is also mentioned, multiple times.


Supply and demand.

Game developers: very sexy, easy to show off, has a (very) large supply of talent. Pay not bad, not good, work schedule very demanding.

COBOL developer: extremely unsexy, almost don't want to admit it publicly. Tiny supply, relatively much larger demand. Pay very good. Work schedule very predictable, not demanding. Almost a vacation.

Supply and demand. Ignore at your own peril.


I don’t think it’s really about the availability of talent but rather about the business end.

Video games are projects and comparable to movies. You need different talent to perform different tasks at different times. Once that talent delivers, you probably won’t need them again.

Unless you’re an established studio, that’s capable of rotating staff from project to project, then there is really no feasible way of keeping staff around that is no longer necessary.

COBOL programmers on the other hand aren’t workin projects where they become unnecessary, because the systems they work on are forever.

Basically you can say, that nobody is working on Rail Toad Tycoon today, but there are COBOL programmers who are still maintaining main frame software that was build before Rail Road Tycoon even released.

I’m not sure how you’d run video gaming differently though.


Which areas do you think have large demand and low supply at the moment?


Maintenance, everybody wants to work on greenfield but there is a ton of maintenance work out there.


It's easy to slide into a maintenance role. I wouldn't say the pay is especially better. There aren't "maintenance engineers" making substantially more than "development engineers".


Ruby on Rails development.


It makes sense for Amazon to not be liable for damages caused by the manufacturer (should eBay be liable for everything sold on its platform?).

But doesn't it also make sense for Amazon to be liable for allowing 3rd parties to sell counterfeit or not-as-advertised items?

Also, how far does this go? Where is the line crossed?


Why not praise DL and strive for practical applications, both present and future, instead of having the goal and evaluating success based on some nebulous "artificial general intelligence"?


AGI can be seen as a system having practical success in all tasks humans can perform. It doesn't change much whether to care about AGI or not, as long as generality of narrow AIs increases.


More great Jupyter Notebooks in the AI field:

1) 16 notebooks from the book "Python Machine Learning" by Raschka & Mirjalili https://github.com/rasbt/python-machine-learning-book-2nd-ed...

2) Linear Regression, Logistic Regression, Random Forests, and k-Means Clustering notebooks by Nitin Borwankar https://github.com/nborwankar/LearnDataScience

3) scikit-learn tutorial notebooks by Jake VanderPlas https://github.com/jakevdp/sklearn_tutorial

4) Lots of deep learning notebooks from the book "Deep Learning with Python" by François Chollet https://github.com/fchollet/deep-learning-with-python-notebo...

Bonus) Jupyter notebook on AWS tutorial (when your local computer just won't handle your notebook requirements): http://efavdb.com/deep-learning-with-jupyter-on-aws/

Please share your jupyter notebook recommendations.


Bonus, Bonus) Google Colab offers free hosted jupyter notebooks with shared access to a K80 and Google Drive like sync. https://colab.research.google.com/


Kyso also offers free hosted jupyter notebooks, as well as access to free jupyterlab environments in the cloud. We render your notebooks as rich blog-posts. Feel free to check it out!: https://kyso.io


> Bonus) Jupyter notebook on AWS tutorial (when your local computer just won't handle your notebook requirements): http://efavdb.com/deep-learning-with-jupyter-on-aws/

AWS also recently released a product called AWS SageMaker which is a hosted notebook environment [0]. I haven't been able to find any good tutorials on it unfortunately. They claim you can go from data, to notebook, to model, to production all within the AWS ecosystem integrating with tools like AWS ETL/Glue/S3/etc.

All magic.

[0] - https://aws.amazon.com/sagemaker/


Notebooks for Udacity Deep Learning Nanodegree lab. Covers a wide range of subjects, including CNN, RNN, etc. https://github.com/udacity/deep-learning


https://github.com/PacktPublishing/Advanced-Artificial-Intel...

NLP, genetic algorithms, and reinforcement learning.


Artistic style transfer as a way to learn about convolutional neural networks: https://github.com/hnarayanan/artistic-style-transfer


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: