Just from reading the documentation, the full text search features on Postgres already look pretty powerful. And it is encouraging that they are actively being worked on. I'm wondering how this compares to a dedicated search engine like Solr or Elasticsearch.
Are there huge differences in performance, features or search quality? At which scale does using Postgres for full text search still make sense?
Having used all 3, Postgres search is my go to for most use case simply because I don't have to deal with managing deltas to an outside system and keeping things in sync. The search features are powerful and fast and PG's ability to combine multiple indexes in search results make it trivially easy to include a bit of full text search in a query right next to geographic distance filters or other conditions. You can also combine multiple types of searches on the fly if you're feeling whimsical.
IMO, the only time to reach for an outside system is when the data isn't being written to PG first (like log ingestion with elastic search) or when search is such a central part of your app that it mandates a separate dedicated system.
Are there any good options to support logic (and/or) and facets/fields with Postgres? We started using ES basically just for the "free" query language. (Obviously we would want something that is safe from sql injection.)
What about when having different PG database instances that has data you want to join on? Would you still use PG as an aggregated read-only copy of the databases or would you use for example ES?
It's situational. When you need a search across multiple data sources, standing up a dedicated search engine makes a lot more sense. Then again...PG Foreign Data wrappers would make that scenario pretty simple without the need for an aggregate.
I can't speak to performance in that situation though.
I used it at 9.4 for a document management system with thousands, not millions, of PDFs that got indexed on upload, and it worked extremely well at that scale--fast, and with all the basic text search features well-covered (tokenization, stemming, etc.). A big win for me was that doing it well in Postgres meant the site could stay a simple Django site rather than adding another service.
You're right, PostgreSQL needs the plain text to highlight it with ts_headline. It's similar to Elasticsearch keeping the original document in the _source attribute. Thanks!
Curious to know since you mentioned that it was fast for thousands of PDFs... any rough timing information on some of your queries for that kind of dataset?
I'm really reaching here to recall, but the short version is that actual searches never took more than a second. All I really cared about was how noticeable a delay to expect, and it was never more than that.
On a bulk import of 1,000+, it took a couple minutes to ingest them. This was all on a $20/month VPS.
I have found Postgres to be good enough for search.
As in... it works well enough, and the advantage of not having to add other tech makes it a no-brainer, I've had zero support issues or customer complaints and most of my applications use full text search heavily.
The big advantage over other approaches, because it's SQL and it's there in my database where I also store users and permissions knowledge... I can permission-limit my fulltext searches.
We have been using Postgres Full Text Search for about 3 years now in production. The app is an analytics dashboard, over a set of structured and unstructured data. We have about 20M documents, with hierarchies, dimensions, but also free text elements. It does work extremely well, and having the possibility to group by as one would do in SQL is a god send for tabular or graph based data. Performance are really good, in particular due to the parallel aggregations.
We tested recently to load our index to an Elasticsearch index for one particular use case (a weighted sum of the 20M rows based on a FTS critera) where postgres was underperforming in our opinion. On the same hardware, using all available RAM and CPUs, ES took 6s and PG took 0.7s.
So far, on the 30+ queries of our dashboard tool, we have yet to find a use case that Postgres didn't handle better than Lucene based solutions.
Mind sharing a table structure from your db? I'm using ES for a project and would prefer to keep things simple (already use postgres in another part of the system).
We use Xapian to search over millions of documents. We are thinking of switching to PostgreSQL built-in FTS to simplify our system. We ran an internal benchmark which showed that PostgresSQL can be competitive with Xapian, except when you need to rank results (in that case the performance is bad).
You'll be interested in ongoing work in this area, then. Oleg & Teodor are working on a new index type (RUM indexes, no less) which will speed up ranking operations conserably.
"ZomboDB is a Postgres extension that enables efficient full-text searching via the use of indexes backed by Elasticsearch. In order to achieve this, ZomboDB implements Postgres' Access Method API.
In practical terms, a ZomboDB index appears to Postgres as no different than a standard btree index. As such, standard SQL commands are fully supported, including SELECT, BEGIN, COMMIT, ABORT, INSERT, UPDATE, DELETE, COPY, and VACUUM."
I had used PostgreSQL for a decade, including full-text search, but just within apps that were already storing their data in Postgres.
The time came to replace our website search (tens of thousands of pages), and we decided to try rolling our own. Someone suggested ElasticSearch, and as I read through it, it seemed to do less than PostgreSQL. I still had the hard problems of (1) spidering the site and (2) converting all the file formats (.doc, .xls, .pdf, etc.).
I ended up just putting wget on a daily cron job to spider the site. Then I ran the saved files through a hodgepodge of scripts to extract the plain text and put it into PostgreSQL.
Once it's there, it's far easier to do the rest. Postgres has its own functions to search for matches, rank the matches, give you snippets, and even highlight the search words in the snippets. It's amazing.
Searches run in a split second. Well, at first, when I was testing, they often took a few seconds. But the weird thing is that after go-live it ran faster. My best guess is that so many users caused Postgres to cache more and more of itself into RAM. The whole server is still using less than 1 GB though, and it's running Apache and Postgres for the website and all its apps.
42 MB for the table, which has columns for the address, title, plain-text body, and computed text vectors for 43,000 pages (web pages and office documents of average length). Then another 100 MB for the GIN index on the text-vector column.
Yes there are huge differences between quality and performance. Apache Lucene (powers Solr and ES) is still the best by far. However if Postgres search works well enough for your use case then great. As others have said, it is one less dependency to manage.
In my experience performance is great if you're just doing text search, but if you combine that with other operators in the same SELECT it can be much slower than Elasticsearch since in many of those cases Postgres needs to fall back to a full table scan.
If you create indexes for all the columns used as filters, which is somehow what Elasticsearch does, then PostgreSQL is able to combine them (generating a bitmap for each used index and ANDing them) and you should get decent performance, don't you?
While it's ok for our purposes, I would wish for a bit better customisability of the text parser and it definitely needs better support for compound words to be perfect.
> At present PostgreSQL provides just one built-in parser, which has been found to be useful for a wide range of applications
and it really means it - changing the behaviour of this component is not possible unless you write a completely different parser in C which, while possible is no fun experience.
We're using the full text feature over product data and we're having to work around the parser sometimes too eagerly detecting email addresses and URLs which messes with properly detecting brand names which might contain some of these special characters.
The other problem is the compound support. A lot of our data is in German which like other languages likes to concatenate nouns.
For example, you'd absolutely want to find the term "Weisswürste" for the query "wurst" (note the concatenation and the added umlaut for the plural in wurst).
Traditionally, you do this using a dictionary and while Postgres has support for ispell and hunspell dictionaries, only hunspell has acceptable compound support, which in turn isn't supported by Postgres.
So we've ended up using a hacked ispell dictionary where we have to mark all known compounds which is annoying and error-prone.
Also, once you have to use a dictionary, you end up with a further issue: Loading the dictionary takes time and due to the way how Postgres currently works, it has to happen per connection. In our case, with the 20MB hacked german ispell dictionary, this takes ~0.5s which is way too long.
The solution for this is to use a connection pooler in front of Postgres. This works fine but, of course, adds more overhead.
The other solution is http://pgxn.org/dist/shared_ispell/, but I've had multiple postmaster crashes due to corrupted shared memory (thank you, Postgres, for crashing instead of corrupting data) related to that extension, so I would not recommend this for production use.
Lucene and by extension ElasticSearch has much better built-in text analysis features so we could probably fix the parser and compound issue, but that would of course mean even more additional infrastructure, plus, probably some performance issues as we, unfortunately, absolutely cannot return all the FTS matches but instead have to check them for other reasons why they must not be shown which, of course, uses the database again and I'm wary of putting all that logic somehow into ES as well.
This is why we currently deal with the postgres tsearch limitations. But sooner or later, we'd probably want to bite the bullet and go dedicated solution.
I'd like to know more about your case, because my own experience is that ordering by ts_rank causes a big slowdown.
PostgreSQL documentation says: "Ranking can be expensive since it requires consulting the tsvector of each matching document, which can be I/O bound and therefore slow. Unfortunately, it is almost impossible to avoid since practical queries often result in large numbers of matches."
Some PostgreSQL developers are working on improving this by using indexes only to compute the ranking, but the related patches are not done yet.
What is the size of your data set (number of rows and size on disk) and the average response time?
I do some searching with pgsql with tiny datasets and ts_rank, on a 10GB dataset of 11 million rows (mostly chat data), and get response times for ranking over all of it around 10-100ms on a cheap OVH 5€ VPS.
Sometimes queries end up even a lot faster, for example the same as above, but searching for "c plus plus", runs in this plan + runtime: https://explain.depesz.com/s/NPOc (11ms)
Last time I tried, it was on a machine with a spinning disk... It looks like I should try again with a SSD, which are a lot better with regard to random access.
Your search term is "Quassel". What happens if you search for a term that matches a lot of rows? This is the case where ts_rank is very expensive. I'd be curious to look at the explain of such a low-selectivity query.
> What happens if you search for a term that matches a lot of rows? This is the case where ts_rank is very expensive. I'd be curious to look at the explain of such a low-selectivity query.
That’s actually quite unproblematic, if you have the tsvector as its own column (not just as index).
It’s far more problematic to actually load that data from disk.
> That’s actually quite unproblematic, if you have the tsvector as its own column (not just as index).
Yes, it works, but it is slow because the tsvector is usually big enough to be stored in a TOAST table, and this produces a lot of random access reads.
This is why there is a project of storing additional information in the GIN (term positions) in order for the index to contain all necessary information for the ranking:
> usually big enough to be stored in a TOAST table
Ah, luckily, in my case, that can’t happen – each row’s message contains one IRC message, so at most 512 bytes. That also automatically ensures we’ll never run into TOAST issues.
Yeah, if your vectors are in TOAST, you really have a huge issue with ranking. There’s no simple way to get around that, except with customized solutions like Lucene/Solr/ES
Ordering can get expensive no matter what just base on how many things you're actually sorting. Ideally, if you can find a way to limit the size of the data set before the ranking sort you'll see a big improvement.
Currently, the only way to make ranking tolerable is to limit the size of the data set before the ranking sort. This is what I did in my tests.
But it looks like it could be possible to massively improve ranking performance by storing all necessary information directly in the GIN index, as proposed here:
Usually an RDBMS like PostgreSQL is used in an environment that has a different usage pattern than search. SOLR can take advantage of specializing for that type of usage. However, Russia's largest search engine, Yandex, seems to like PG http://momjian.us/main/blogs/pgblog/2016.html#September_28_2...
I doubt they use it for actual search, though. Given that core company product for end-users is Search Engine, Yandex probably has some in-house system, that Mail team can leverage for their searching needs.
For the small data use cases I've seen, Solr always returns results quickly. You're limited in how you can query it - it's modeled as one giant table, and the query syntax is much more idiosyncratic than SQL.
Postgres is really reliable, and I think a lot of the performance difference comes from robust transactions. For some use cases you can use both and replicate data or query one + the other in sequence.
It looks like Postgres Professional is working on improving FTS. Here is a relevant presentation from Oleg Bartunov about the new RUM index and its benefits for FTS:
Can pgsql fts do stemming or more complex lemmatisation for languages other than English? Or ranking of results based on Okapi BM25 or similar? I was looking into this about two years ago and those were the features in favor of Lucene (basis of ES and Solr).
Pgsql can do stemming and everything it can do in English also in several dozen languages, including German, French, Spanish, and any for which you install dictionaries. It’s quite useful
Please can the Postgres team put some major focus on completing logical replication [1]. It's the missing piece to making upgrading across major versions painless and quick on large databases so that we can take advantage of all these nice new features. We're on a Heroku's hosted Postgres service so can't install the pglogical extension.
I don't think you would be able to take advantage of logical replication on Heroku Postgres regardless--they don't allow you to replicate to your own instances, only other Heroku-hosted instances.
This makes migrating off Heroku for Postgres a PITA and requiring down-time.
True, that would be an extra bonus if Heroku started allowing replication to non Heroku instances, but as long as they support logical replication to a Heroku Follower instance then you can upgrade to new major versions with near zero downtime - set up logical follower running latest postgres version and then promote to master once it has caught up. Currently you can't have a follower that is a different version to master - meaning an upgrade requires either a full backup and restore to new version resulting in significant downtime if you have a large database or using the pg_upgrade utility which is generally not recommended as it is not guaranteed to work.
Everyone speaks about InnoDB and how performant and reliable it is... and multiple firms even use it as a KV-store (Uber/Pinterest/AWS) bypassing MySQL entirely. I have never heard much about storage engines in Postgres, why could this be so?
Wikipedia has a (stub) article on InnoDB, but nothing on Postgres' storage engines... just wondering why that is.
The PG storage engine is not particularly awesome. It is basically COW (with exceptions) and compaction (called vacuum) has been quite painful for a long time. Every release it is supposedly fixed, but people keep complaining. This not to say PG sucks, their optimizer knows far more about their data than InnoDB and PG can perform far more types of execution plans.
We (Pinterest, I wrote most of the MySQL automation) make heavy use of MySQL replication which is vastly simpler to manage than PG. All queries still flow through SQL and unlike PG, we can force whatever execution plan we need. We do lots of PK lookups, and InnoDB is really good at that. In InnoDB all the data is stored in the PK while in PG it is just a pointer.
>>In InnoDB all the data is stored in the PK while in PG it is just a pointer.
This is just a consequence of the PK being a clustered index in InnoDB which has both pros and cons. One of the big cons is that all of the columns of the PK are implicitly added to every secondary index as the row identifier. That isn't a big problem if your PK is a single column int, but if it's multiple columns, that often results in unnecessary bloat in your secondary indexes. Ideally (as in, dare I say, MS SQL Server), you'd have the option of a clustered or non-clustered PK for your table so you could choose the optimal index structure for your workload on a per-table basis.
Yes, you can, but it doesn't change the fact that you still have a clustered index (an index organized table), which is great for PK lookups, but bad if you do a secondary indexes lookup (because you need to lookup through 2 B-trees instead of 1). There is real, and well-known, tradeoff here.
You missed the point of my post. You are going to have one of the two issues, either looking through two index or indexes including the a large PK. At least with InnoDB you can make the choice. The strategy I suggested gets you the desired outcome of not including a large PK in all secondary indexes.
> The strategy I suggested gets you the desired outcome of not including a large PK in all secondary indexes.
For an application in which most queries need a secondary index lookup, using heap organized tables is more efficient because the database needs to traverse only one B-tree (for the secondary index) that gives the physical position of the row in the heap. When using index organized tables, the database needs to traverse 2 B-trees (the secondary index first, then the primary index). Making the primary key short by using an auto incrementing integer helps, but doesn't remove this overhead.
The other part of the tradeoff is that inserts and many other write operations are less expensive in heap tables. A Big Table in InnoDB, measured in "when do I start having to spend a lot of time troubleshooting this table's performance" is about 1% the size of a Big Table in Postgres. TokuDB was introduced for MySQL for a reason.
Heap vs. Index organization is a classic tradeoff of database design.
Now, if you're saying "it would be really nice if Postgres had the option of index-organized tables" I'd agree with you. I'd love to have that, as an option.
> All queries still flow through SQL and unlike PG, we can force whatever execution plan we need. We do lots of PK lookups, and InnoDB is really good at that.
Being able to force the execution plan is more useful in MySQL than PostgreSQL because MySQL's optimizer is not very good at planning queries.
If you do a lot of PK lookups, then you don't need to force the execution plan.
FDW are a great way to access external data sources, but it lacks proper support for visibility and transactions, and so on. Also, the FDW API follows the "tuple at a time" execution model, which prevents a lot of optimizations in the upper part of the stact (vectorized execution etc.).
There are several products using FDWs to change storage, but I'd call it a misuse of a feature designed for very different purpose.
IMHO it's hardly a way forward without significant changes/improvements (which may happen, I don't know).
Having "CREATE ACCESS METHOD [...] ON STORAGE|TABLE" to create a custom access method, or storage engine, and extending CREATE TABLE to be able to pass a storage method with the table definition could become a quite powerful combination. The main challenge is to come up with an interface solid enough to be able to handle problems related to MVCC, like VACUUM cleanup.
> I have never heard much about storage engines in Postgres, why could this be so?
Because PG isn't designed around pluggable storage engines, so its not really as practical to take a storage engine out and use it separately, and doesn't make much sense to talk about the storage engine separately from the whole system.
FWIW, these solutions rarely bypass MySQL entirely or at all. Although there are ways to access InnoDB without making SQL queries (Memcached API; Handler Socket), the MySQL server is still involved. It just skips the normal protocol, auth, SQL parsing, etc.
Even then, there aren't a lot of published cases of people using these alternative access methods at scale yet. AFAIK, all of the large kv use-cases you've mentioned still go through traditional SQL queries. Despite the overhead of SQL parsing, it provides more control and visibility. The ecosystem around alternative access methods isn't nearly as mature.
You may be confusing InnoDB with MyISAM (which is prone to corruption, especially upon crashes) or with running MySQL without a strict SQL mode (which causes bad things like silent truncation of overflowing values).
InnoDB is, and always has been, a very reliable and durable storage engine with solid performance characteristics.
This one is huge for my company. Almost every single query of ours could use an index-only scan, but the planner would never choose to perform one because of the weirdness around partial indexes. We expecting a several x speedup once we upgrade to 9.6. All the need to improve now is a way to keep the visibility map up to date without relying on vacuums.
I don't see ever going away from using vacuum to maintain the visibility map, but hopefully the changes in 9.6 will make it a non-issue on large tables.
> I don't see ever going away from using vacuum to maintain the visibility map
I don't think that's that unlikely to change. There's two major avenues: Write it during hot-pruning (which is done on page accesses), and perform a "lower impact" vacuum on insert-only tables more regularly
> but hopefully the changes in 9.6 will make it a non-issue on large tables.
You mean the freeze map? That doesn't really change the picture for regular vacuums, it changes how bad anti-wraparound vacuums are. The impact of the table vacuum itself is most of the time not that bad these days (due to the visibility map), what's annoying is usually the corresponding index scans. They have to scan the whole index, which is quite expensive.
I think there is already a patch for Postgres 10 that runs the vacuum on insert only tables. While not completely solving the problem, that will be helpful.
I tested parallel queries on PostgreSQL 9.6 on a few TBs of data, 5 billion rows on an older dual Xeon E5620 server. I also striped 4 Intel S3500 800GB drivers with ZFS and enabled LZ4 compression which has a compressratio of 4x.
For a sequential full table scan I could process about 2000MB/s of data(only 125MB/s was read from each SSD), I was limited by CPU power.
Anyway, same query took about 25 minutes on PostgreSQL 9.5 and now it was down to 2minutes and 30 seconds. For comparison, SQL Server 2012 spent 7 minutes on the same dataset on the same hardware.
Would you be willing to re-run that with SQL Server 2016? A Dev license is free, and there's been a lot of relational engine optimization since 2012. I'd be curious to see what the latest release can do compared to Postgres' latest.
I realize I'm asking a stranger on the internet to do something for free for me. If you don't have time or inclination to do this, no worries, but it seems like you've got a nice setup to be able to play with this. I'm sure I'm not the only one curious to see such a comparison.
I'm a fan of both PostgreSQL and SQL Server, but I think these numbers are very workload-specific. I've gotten 1GB/s throughput on SQL Server 2012 on spinning disks and CPUs older than the E5620, so I've no doubt that same workload would exceed 2GB/s on your hardware. The apples-to-apples comparison here is between the two versions of PG where the performance improvement is clear. It's harder to do an apples-to-apples comparison between PG and SQL Server because the optimal schema and queries for a particular workload are likely to differ for each of them.
With SQL Server I don't get 2000MB/s on the same hardware, more like 600-800MB/s. This is most likely because of LZ4 compression and large block sizes(64k-128k) on ZFS, that results in a lot less IO. Because with SQL Server, IO was the bottleneck.
So yes, it is very workload specific. For random read/write they are probably more similar. But for reading a lot of data that can be read sequentially, PostgreSQL seems to win hands down, because it can get a lot of help from ZFS compression.
I would love to run the same test when SQL Server is available on Linux. But ZFS do also deliver slightly better throughput and slightly more iops on the FreeBSD platform, which I ran this benchmark on. And SQL Server probably demands a 4k block size, which is so small that LZ4 compression has no effect as I've already tried to run SQL Server on ZFS via iSCSI.
Hmm, I've never run SQL Server on anything other than NTFS where 64KB was definitely the recommended block size. In either case, it's great to have choices. When the license fee is coming out of my pocket, I'm definitely not choosing SQL Server.
There's also a limited amount of memory-level parallelism available... with 4-DIMM sockets you might need an 8-socket machine to get a 32x improvement on large (memory-bound) sequential scans, which I'd guess you can get on top-end Power machines.
(You can probably get more memory level parallelism with random access, but your overall bandwidth will likely be lower... fully exploiting memory bandwidth is complicated and difficult to do for real applications).
It's very difficult to get a 32x speedup from 32 cores as there are always parts that are inherently serial, so it's more likely they tested it on a 64 core machine or something like that.
Congrats on great release. With availability of E7-8800 v4 based servers (up to 192 cores in a single box) PG can cover a huge number of use cases without complicated setups.
On that topic - what's the general feeling about RDS?
I'm running pg on ec2 with a hot standby slave. I need the postgis extension but am not doing anything particularly esoteric. Ideally I'd like to have the certainty of aws handling backups for me.
I was researching moving to RDS today and would love to hear thoughts on whether it's a good general solution or not. What happens about downtime during upgrades or swapping instance sizes?
> What happens about downtime during upgrades or swapping instance sizes?
This is one of my favorite features of RDS: You can set a maintenance window and have the option to not have changes take effect until that window. So if I want to upgrade Postgres or change the instance size, I set it up and the downtime happens when I'm fast asleep and nobody is using the site.
I also think (but not 100% sure) that if you have Multi-AZ enabled, changes are done by upgrading the slave, failing over, and then upgrading the ex-master, so downtime is limited to the failover period.
Ah ok. That's useful info about the multiAZ setup - I'll have a look into that.
In my case, we now have customers around the world so we don't get the "night time" luxury. Part of the work I'm now completing is to split the system into an accounts db and customer data db. I was thinking to dip my toe in the water by just moving the account db to RDS to see how it goes.
downtime is more limited by your application. i.e. if there is a failure your connection / database pool just needs to reconnect.
btw. we only have had psycopg2 pointing at it yet and that worked without a downtime.
however i guess java hikaricp is as fast as that well.
only "failures" will have a "small" downtime, however planned downtime is pretty/zero fast.
Yes, I've noticed this too. The AWS documentation and the console both say that the changes may take a long time to apply, but in fact the database is up for most of that time. I've done several big upgrades that had only a few seconds of downtime.
Postgres RDS is solid and it supports PostGIS. If you have multi-AZ enabled, then downtime is typically measured in seconds even when upgrading versions or changing instance sizes. It will update one instance, automatically switch to it, and update the other one. It automatically handles backups, syncing read slaves, etc too. It is awesome in my experience.
I've just read about this and there are a lot of people saying they had significant downtime during upgrades ([1] one such story, but there were quite a few on stackoverflow etc).
I'm going to run some tests myself to see how well it works on my existing data (only 30GB at the moment, but it was only 20GB a month ago and is growing fairly rapidly).
Compared to others my experience with RDS is bad, although I used it last year ago perhaps things improved.
One major issues is that you are restricted to what you can do with it, not all options are available. You can only use extensions that they provide. (this I'm a bit fuzzy about) but changing disk size made service unavailable for ~30 minutes (proportional to new disk size). You weren't able to configure replication, the replication only happens to the backup node. You weren't even able to set up replication across regions.
The replication is kind of a bummer, because if you ever would like to move your data (perhaps to a vm or outside of aws) you would need to have an outage. Also if I remember there was no way to do major version update in place.
There was also another incident (it was caused by bug so hopefully it was fixed and won't happen to anyone else). We had cluster set up with a backup. One day out of nowhere the service stopped working and was unavailable for 1.5 hours. That was quite big issue because we used it for monitoring (zabbix), so any outage makes us blind to issues. Turned out that due to bug their backup routine made a mistake and started doing backup on the master server (normally it supposed to do on slave).
Probably in 3-4 months. AWS has historically had a 3 month gap time for postgres. Their policy (from what they have said on the forums at least) is they wait for at least x.x.1 release before they start working on it.
> Is it just selection bias from posted links on HN, or has the PostgreSQL team been doing many (feature) releases lately?
I think more like the former -- as I recall, the recent articles have mostly been about specific work going on for the 9.6 release, prereleases of 9.6, and now the actual release of 9.6.
There's still only one major PostgreSQL release per year. There were a few posts about cool stuff built on top of PostgreSQL, a few posts about progress of the 9.6 development (e.g. when the parallel query got committed) etc.
Which is a form of conflict resolution. It requires the application to be aware of the datastore.
I wonder if, since BDR is just a plugin now, a plugin that used strong consistency guarantees could be built using the same changes that were required for BDR.
Are there huge differences in performance, features or search quality? At which scale does using Postgres for full text search still make sense?