Yes, that's three generations earlier. (Step, panel, crossbar, ESS.) Pulse-dial machines have the property that you can hear the number being dialed by the number of steps advanced in each selector. The sounds are similar, but different.
The ESS could handle pulse-dial as well, but the pulses only went as far as the subscriber's own line-equipment frame. And most subscribers had gone touch-tone by the time I was there in the 90s, so hearing pulses in the LE was a rarity. Regardless of how the dialed digits came in, the call routing computations were all electronic, which allowed vastly more flexibility. With ESS, you could do traffic-aware routing, you could have non-hierarchial trunking, and you could introduce a whole set of CLASS features like call-waiting and auto-callback. (Plus the machine was much better at diagnosing itself and dropping a trouble-ticket, which considerably reduced maintenance overhead.)
But regardless, what's common to both the SxS in your video, and to the ESS, is that the call path is set up with relays. In the SxS those are stepping relays, aka selectors, and they make the distinctive counting rhythm at each step. In ESS by contrast, they're plain relays (I don't know a term for expressing how plain), which means they're mechanically simpler, smaller, and much more reliable. And each one only goes clack as it pulls in, or thunk as it releases, a single action rather than a counting rhythm.
With the previous systems, a given call might take 20 seconds to dial, and those same 20 seconds to set up the path, the operations are one and the same. So the action of the individual selectors is in lock-step with the subscriber's finger turning the dial, and the dial spring returning it while pulsing out the dial signals which directly drive the relays miles away.
In a pulse-dial office, you hear a lot of clicking all the time, and that's typically several call setups happening all at the same time, overlapping. Each stretched out over 20 seconds or so, starting randomly when subscribers pick up the phone and begin to dial. The "intensity" of the clicking associated with any given call is low, because it takes so long for the action to unfold.
If it's a very slow night in a pulse-dial office, you can hear a single call working its way through the machine, first the hunting of a line-finder as they go off-hook, then digits coming into each successive group, physically establishing the call path across the floor. You can walk through the machine and follow it as it happens.
With ESS, the dialing happens first, the same 20 seconds if the subscriber is using a rotary phone, or more like 2 or 3 seconds on touch-tone. In either case, all the digits are absorbed and interpreted by the central control computer, which decides when enough digits have been dialed to take some action. (Since landlines didn't have a SEND key, the dial-plan had to be structured in such a way as to "know" when the dialing was complete, simply by what had been dialed.)
But once the computer decides to take action, it happens almost all at once -- a flurry of activity as relays throughout the call path are all commanded near-simultaneously. Most take two actions in quick succession: First, the section of the path being set up is connected to a test circuit of some sort, which checks that the relay contacts are clean, the path sounds good, no stray voltages are present, etc. Then in a blink, the test is released and the proven sections are connected to each other. The whole process takes well under a second.
So, in the ESS regime, the clicking associated with a single call happens in a very short burst of rapidfire clicks and clacks, and what you hear over time is numerous calls, unrelated to each other.
(With the exception if you're standing next to the line equipment frame of a subscriber using pulse-dial, where you'll hear 20 seconds of digits slowly coming in, no activity for this call is happening outside this frame, THEN once dialing is complete, there's the burst of rapidfire call-setup throughout the whole machine, just like any other call.)
The clicking in an ESS also has a positional component -- the call path still takes place throughout numerous pieces of the machine, and your stereo ears pick that up -- it just happens so fast, it's like someone tossed a whole tray of items into the air and they all return to earth at nearly the same instant, all around you. If it's a slow night, you can pick out all the clicks associated with one call because they happen so close in time. But you can't take a leisurely walk through the machine and follow that one call as it sets up. By the time your brain registers that anything is happening at all, it's over. A single aural phenomenon with a positional component, rather than a methodical sequence that plays out before the observer.
"4 1/2 minutes of working Strowger, step by step ,Western Electric telephone switch"
https://www.youtube.com/watch?v=M-pd3GjNMWc