Speaking of syncing strategies, I use Django with PostgreSQL as the main datasto...

rachbelaid · on Dec 8, 2014

It's also a Django project, I like what you have done. It's a nice lib and it's intuitive. How do you handle a ES document which is a composition of multiple Django Model?

It's just personal opinion but I try to avoid having my application responsible of the data integrity so I went into the way of ES River plugin to pull data. I used the JDBC one, https://github.com/jprante/elasticsearch-river-jdbc ... I met some problems but at the end, it works quite well and don't need logic in the application keep the data synced.

Two other reason that I used River was to not make the app slower by saving data in ES and also of being able to run the application during development without the need of ES and being installed able to substitute the search with a stub.

jaddison · on Dec 8, 2014

Good question - django-simple-elasticsearch is definitely focused on generating a document from a single instance of a specific model. Of course, you can add supplementary data from associated models via M2M or FK models as you see fit (nested objects, etc.)

I've thought of adding in support for pushing bulk index request data to redis (for example) so that an Elasticsearch river could pull from it; this would decouple the app somewhat - but not completely as you've noted. It would likely help with throughput however, and still provide you with the ability to do pre-processing on the data as needed within your app's/project's context.

The Elasticsearch JDBC river isn't as flexible for processing data if I'm not mistaken, as it doesn't have context for the data? Please correct me if I'm wrong, but it's somewhat limited?

rachbelaid · on Dec 9, 2014

I think that you nailed it, Elasticsearch JDBC is not flexible and you need to be good in SQL if you have a complex model with many relation. Because JOIN create a cartesian product you can end with duplicate and you have to find a way to avoid that because duplicate affect the ranking. In my case I ended using UNION queries.

Also you cannot also have multiple queries but not to update and existing document(eg: adding extra fields) The SQL query for me ended being ~100 lines but the model is quite complex with multiple languages support.