I was working in data analytics + data science a decade ago and we stored everyt...

teej · on Oct 18, 2020

It’s not a new phenomenon so much as it has emerged as an important shift from the status quo 20 years ago.

What’s changed in the last 10 years are the access patterns. There’s increased demand to have arbitrary query access over the raw data. The most impactful technology changes have been about pushing the access layer (queries, stream & batch processing, dashboards, BI tools, etc) down as close to the raw data as possible and making that performant. What’s fallen out of that are better MPP OLAP databases (snowflake), new columnar formats (parquet), SQL as the transform layer (dbt).

gfodor · on Oct 18, 2020

Ah that makes sense. Thanks.

fscherer · on Oct 18, 2020

why is it actually that SQL "re-emerged" as the transformation layer? I thought that it first shifted from SQL to Query Builders inside talend, matillion etc. Why now SQL again?

scrollbar · on Oct 18, 2020

Probably just the emergence of dbt? I’ve only been doing ETL for a couple years personally but couldn’t imagine using so much SQL in our pipelines without a framework like dbt