I'm certain I'm missing something *very* obvious about Elasticsearch and other N...

jacobr1 · on Nov 30, 2018

I use `refresh=True` on my insert/update/delete, which forces the writes to complete. Then all the reads work as you would expect.

kn7 · on Nov 30, 2018

That still doesn't guarantee that a consecutive read is gonna get the last state.

haggy · on Nov 30, 2018

Direct from ES documentation on the `refresh` flag: "Refresh the relevant primary and replica shards (not the whole index) immediately after the operation occurs, so that the updated document appears in search results immediately."

Do you have other information about the refresh flag because their documentation clearly states that forcing a refresh is applied to primary and replica shards meaning that it will be available for query directly after the call to refresh is made.

JohnBooty · on Dec 3, 2018

My experience was that this was not reliable. It was nearly always true, but not always, and tests would sporadically fail some small percentage of the time.

However, this was back in ~2015 and Elasticsearch 1.3 or something like this, which is of course a now-ancient version. Perhaps things are different now.

edit: Perhaps we were using the refresh command and not the refresh flag. It was a few years ago and I don't have access to the code any more, and my memory may be failing here. If the refresh flag works as advertised (enforces an index update and guarantees a consistent view of the data for the next query, which the command did not seem to) then that of course solves my initial problem W.R.T. writing tests.

bobbyi_settv · on Nov 30, 2018

Yes it does. I'm not sure what you think the refresh operation is if not that.

I've run countless integration tests with ES and never seen something fail due to refresh not working as advertised. If you have, what version of ES was it? Can you give some sample code that sporadically exhibits the problem?

kn7 · on Nov 30, 2018

We generally use index refresh in ITs (running ES in Docker) and it fails occasionally, which I believe the case described here: "The (near) real-time capabilities depend on the index engine used." https://www.elastic.co/guide/en/elasticsearch/reference/curr...

manigandham · on Nov 30, 2018

That seems to be the issue then. The refresh flag should be passed in your insert/update/delete operations.

The refresh command can also be called (which is what you're doing) but this is a different operation and just triggers the index build with no guarantees that it finishes or is consistent with any particular data mutation.

Did you read the previously posted documentation for the refresh flag?

athenot · on Nov 30, 2018

ES comes in handy with very large sets of data. If it fits on one (or a handful) of nodes, you're probably better served by a relational database.

When you try to scale something to large sizes AND want high availability, it's pretty much a given you'll be dealing with eventual consistency.

We use ES to ingest billions of records per day. For us, being able to immediately query a row that was just added is less important than being able to deal with the volume in relatively predictable performances.

manigandham · on Nov 30, 2018

It's not a given. Plenty of distributed databases support strong consistency.

ES is not meant to be an OLTP database. It's a search index with a much better wrapper around Lucene, but the distributed part has always been weak. The last several years of updates have primarily been around fixing the home-grown replication and storage.

manigandham · on Nov 30, 2018

Elasticsearch has a "refresh" query flag to force immediate consistency: https://www.elastic.co/guide/en/elasticsearch/reference/curr...

kn7 · on Nov 30, 2018

That still doesn't guarantee that a consecutive read is gonna get the last state. Welcome to the wonderful world of "eventual consistency".

atombender · on Nov 30, 2018

The "refresh=wait_for" [1] index setting does guarantee that a subsequent read will get the data. It causes all shards to refresh:

    Refresh the relevant primary and replica shards
    (not the whole index) immediately after the operation
    occurs, so that the updated document appears in search
    results immediately

There's also the "wait_for_active_shards=<n>" setting, which merely asks to wait until n shards have written the changes.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/curr...

manigandham · on Nov 30, 2018

How so? That's the entire point of the feature. What else is there to update?

IggleSniggle · on Nov 30, 2018

It guarantees that all replications will report state in sync with each other on search, not that the last reported state is the actual current state of the index.

manigandham · on Nov 30, 2018

As the numerous comments here and the documentation states, the refresh flag on your insert/update will ensure that changed data in that request is consistent for queries after.

Where did you get the behavior you described? Are you sure you're not confusing this for the separate refresh command itself? That is not attached to any particular insert/update.

kn7 · on Nov 30, 2018

That is indeed the case and we are also bitten by that. The most effective work around we managed to find is to flush periodically (say every 300ms) for a certain timeout period before reading from ES again for checks. Though even then, ITs still fail time to time.

scarface74 · on Nov 30, 2018

Well, with C# and Linq with Mongo (you did day “other NoSQL” data stores) you can mock out the provider and test your Linq queries by substituting the IMongoQueryable<T> with a List<T>. ElasticLinq is a thing but I’ve heard mixed things about it.