I work on BigQuery. All of these are great points: just wanted to point out that BigQuery can federate into external data sources as well: e.g. files on cloud storage and BigTable. Relevant feature is BigLake: https://cloud.google.com/bigquery/docs/biglake-intro
Are there any performance benefits of BigLake over external tables stored in Parquet governed by Hive? Or is the main benefit the governance flexibility?
Currently the main benefit of BigLake over the current external tables is governance: you get row and column level security over cloud storage data. The governance is uniformly enforced across BigQuery and also the BigQuery storage API. The storage API can be used by any engine and we have pre-built open source connectors available for Spark, Presto/Trino, Dataflow and Tensorflow.
We're constantly working on improving BigQuery performance over open file formats on cloud storage. Some of these features will be specific to BigLake. Please stay tuned.