Whether a technology can replace Hadoop in an organization depends on many factors, but some technologies that solve at least in part similar problem are Apache Storm, Spark, Flink, Kafka Streams,
and maybe BigQuery?
Or, as the original article says, some companies just use some command line tools, shell scripts.
It's been a couple of years since I was interested in Data Engineering, so my knowledge on this topic is some years behind.
I've not seen Storm being used anywhere sane for a few years at least now, and from a glance at job postings it looks unlikely. Spark, Kafka Streams etc. are definitely used in a modern data platform from my experience.
I think we're seeing a big shift with Hadoop-like workloads being moved onto cloud providers, so BigQuery, Amazon EMR etc.
I'm curious what constitutes "big data" anymore. In an intermediate machine learning course, we train on nearly a petabyte of data using Google Colab and Jupyter Notebooks. Nobody discusses the size of the data requiring any special treatment due to its size... would not 95% of a petabyte be "big data"?
What course are you taking? Imagenet is only 150 GB, and Common Crawl is only 320 TB.
Big data is a moving target, but I’m comfortable defining it as data too large to fit in memory. Obviously, you can always get a bigger node, my rule is thumb is that if you need generators, you are working with big data.