Mainframe is still my favorite architecture for F100 use cases. You'll note how your grocery stores, debit cards, gas stations, toll roads, et. al., continued to function ~fine during the AWS outage this week.
The biggest problem with the mainframe conversation is the TCO paradox that it creates. For the average CTO, the prospect of paying IBM millions of dollars up front is absolutely a non-starter. They just can't get beyond this aspect. It doesn't matter if you promise them a genie with unlimited wishes at the end - and this isn't too far off from what you actually get in some cases. The initial sticker shock is simply a bridge too far.
The savings you see only manifest after years of not doing the other things. This kind of savings is usually invisible to the business leaders. You have to make a really big leap of faith and have a lot of good mentors and leaders around you to execute on this kind of architecture. It is also essential to lead the technology people along the path of best practices. I've seen a large corporation justify moving away from mainframe after allowing its employees to load applications on it that are wildly unsuitable for that kind of compute platform. Think things like SalesForce and GitHub Enterprise - "See, it runs like ass and IBM is billing us like crazy! - we need to get off mainframe".
I guess for IMS/CICS/TPF/... the IBM mainframe is a just fine appliance compared to the alternatives. While not exactly transaction processors, SAP HANA, Oracle Exadata and co. all market themselves towards the same customer groups; SAP even sells full banking systems for medium-sized banks.
Your point that TCO is lower than a well executed alternative seems very dubious to me though. Maybe lower than cloud and also certainly lower than whatever crap F100-consultants sold you, but running database unloads with basic ETL for a few dozen terrabytes per month creating a MSU-bill in the millions is just ridiculous. The thing which probably lowers the TCO is that EVERY mainframe-dev/ops-person in existence is essentially a fin-ops-expert formed by decades of cloud-style billing. Also experience on a platform where your transaction processing historically has KB-range size limits, data-set-qualifiers are max. 44 chars, files (which you allocate by cylinders) don't expand by default and whatever else you miss from your 80ties computing experience naturally leads to people creating relatively efficient software.
In general even large customers seem to agree with me on that (see Amadeus throwing out TPF years ago) with even banks mostly outrunning the milking machine called IBM. What is and will be left is governments. Captured by inertia and corruption (at the top) and being kept alive by underpaid lifelong experts (at the bottom) who have never seen anything else.
> during the AWS outage this week.
Also the reliability promises around mainframes are "interesting" from what I've seen so far. The (IBM) mainframe today is a distributed system (many LPARs/VMs and software making use of it) which people are encouraged to run on maximum load. Now when one LPAR goes down (and might pull down your distributed storage subystem) and you don't act fast to drop the load you end up in a situation not at all unlike what AWS experienced this week: critical systems are limping on, while the remaining workload has random latency spikes which your customers (mostly Unix systems...) are definitely going to notice...
The non-IBM-way of running VMs on a Linux box and calling it a mainframe just seems like a scam if sold for anything but decommissioning. So I guess those vendors are left with governments at this point.
> The (IBM) mainframe today is a distributed system (many LPARs/VMs and software making use of it)
Not really. While you can partition the machine, you can also have one very large partition and much smaller ones for isolated environments. It also has multiple redundancy paths for pretty much everything, so you can just treat it as a machine where hardware never fails. It’s a lot more flexible than a rack of 2u servers or some blade chassis. It is designed to run at 100% capacity with failover spares built in. This is all transparent to the software. You don’t need to know a CPU core failed or some memory died - that’s all managed by the software. You’ll only noticed a couple transactions failed and were retried. You are right in that mainframe operations are very different from Linux servers, and that a good mainframe operator knows a lot about how to write performant software.
And incidentally all documentation recommends not extending your LPARs beyond what is available on a single CPC-"node" (see [0]-2-23 for a nice (and honest...) block-diagram). If you extend your LPAR across all CPCs I doubt that many of the HA and hotswap-features continue to work (also there is bugs...). E.g.: you won't hotswap memory when it's all utilized:
> Removing a CPC drawer often results in removing active memory. With the flexible memory option, removing the affected memory and reallocating its use elsewhere in the system is possible.
So while you can have single-system-images on a relatively large multinode setup I doubt many people are doing that (at the place I know, no LPARs have TB of memory...). Also in the given price-range you easily can get SSI-images for Linux too: https://www.servethehome.com/inventec-96-dimm-cxl-expansion-...
If you don't need the single-system-images, VMWARE and Xen advertise literally the same features on a blade chassis minus redundant hardware per blade, which is not really necessary when you just migrate the whole VM...
Also if you define the whole chassis as having 120% capacity, running it at 100% capacity becomes trivial too. And this is exactly what IBM is doing keeping around spare CPUs and memory in all setups spec'ed correctly: https://en.wikipedia.org/wiki/Redundant_array_of_independent...
You are right though that the hardware was and is pretty cool and that kind of building for reliability has largely died out. Also up until ARM/Epyc arrived maximum capacity was over-average, but that is gone too. Together with the market-segment likely not buying for performance I doubt many people today are running workloads which "require" a mainframe...
A real shame, but offloading reliability to software engineers makes the hardware cheaper, something IBM mainframes aren't known for.
> doubt many people today are running workloads which "require" a mainframe...
It seems to me mainframes are built with profoundly different requirements than the ordinary hyperscaler server, with a lot more connectivity and specialized IO processors than CPU power. The CPUs are really fast, but it's the IO capacity that really set them apart from the top-of-the-line Dell or HPE.
If IBM really wanted to make the case for companies to host their Linux workloads on LinuxONE hardware, they'd make Linux on s390x significantly cheaper than x86 on their own cloud. I am sure they could, but they don't seem willing to do so.
# VAXen, My Children, Just Don't Belong In Some Places
The trouble started when the Chief User went to visit his computer and its VAXherd.
He came away visibly disturbed and immediately complained to the ELFI's Director of Data Processing that, "There are some _very strange_ people in there with the computers."
Now since this user person was the Comptroller of this Extremely Large Financial Institution, their VAX had been promptly hustled over to the IBM data center which the Comptroller said, "was a more suitable place." The people there wore shirts and ties and didn't wear head bands or cowboy hats.
Oh my. It's 2025 and I'm just reading this for the first time.
In 1998, we were getting some large consumer brands on the World Wide Web for the first time. One of our customers had a Director of Security who didn't trust us. When he came out to see our data center, our web services, he trusted us even less. The guys wore ties that day, but the long hair didn't help.
It was really too bad; the Security Director was not wrong about many aspects of the whole idea and he was able to get executives in our parent company to realize that security best practices would require some structural changes on our part; we couldn't just buy a net appliance to take care of it. Having that client on board with that Security Director's input could have been a productive experience. But he didn't like what he saw, and that particular project was canceled.
Given the rather percussive events in this tale of The Little VAX and the DataCenter, perhaps that was all for the best.
You don't have to pay upfront, IBM in fact prefers you don't
Leasing, or frank out renting aka "cloud" that just happens to be IBM is the preferred form, especially as they can sell you on usage based pricing (good if your workload follows common pattern of spiking at end of month)
But profitable to IBM, and counts as OpEx not CapEx for accounting. A bit like cloud. But if you want they will ship it to you, or just setup a VPN or even a more dedicated connection (say, MPLS) to one of their datacenters. Or even sell it to you cloud style, running on LPAR/zVM.
They also tend to send you a more filled out mainframe (more CPU, more memory) so you can be flexible with utilization or "pay on demand" for more occassionally.
You can get a z/OS VM on the IBM cloud, but it costs at least $5/hour. Would be wise to have some automation to turn it on and off as you need. You will be able to SSH to it and, with some configuration, connect to it with x3270 or other emulator.
IBM also offers free courses and a lot of materials.
A lot of the ideas in modern z/OS and z/VM are present in MVS 3.8 and VM/370, which run on the Hercules emulator. You can run modern z/OS on it, but it violates the license.
That said, z/OS is very alien to people coming from Windows and Unix. It’s a new world of acronyms, metaphors and ideas that evolved from a very different set of constraints. It’s like octopuses and mammals - both highly intelligent, but radically different at the same time.
> ... continued to function ~fine during the AWS outage this week.
Isn't any given mainframe stuff one backhoe or flood away from its own outage? What's their redundancy and DR plan look like? It's not like they have AZ's and regions, more like a warm replica data center, right?
I toured a facility that utilized Parallel Sysplex / GDPS CA. This offers true RTO = RPO = 0. You could take a fire axe to any piece of hardware in the building and it would have zero effect. For catastrophic events, the guarantees relax a little bit, but they're still very strong. Someone breaking an entire fiber vault or setting half the datacenter on fire would still not compromise operations in this facility. It's essentially two datacenters in one, much like how an AWS region works with multiple AZs. Each side of the facility is entirely independent. Somewhere inside a mountain in Colorado a 3rd set of machines is passively replicating everything as well.
The most resilient mainframe solutions involve a purpose built facility. The cool thing about the mainframe is that it isn't very big. You can get do a lot of damage with what is effectively just 4 racks of hardware. You'll probably have another 10-20 racks worth of HSMs, firewalls, VPN concentrators, UPSes, etc. Most of the infrastructure is to support the mainframe. So, the facility doesn't actually have to be very large. It just needs to be in a really good location and built like a bunker.
The datacenter my company rents racks in has a IBM mainframe in it along with a rack for storage and a rack for backup. Very clean and also very expensive.
The biggest models are same size as biggest IBM Power machines, i.e. 2 four-post racks connected together.
You then need liquid cooling connection, power (iirc the racks can contain internal UPS). You're still left with need for storage which can go from a shelf to maaaany racks, related networking gear (ethernet and FiberChannel) and that's it, but a lot of it would be common to any serious enterprise DC.
As far as I can tell, they don’t need external water connections. While everything serious is liquid cooled, they have on-rack radiators. The z17 machines are shipped with the cooling circuit sealed, so it doesn’t need to be filled before use. Power consumption is also relatively low because that’s one demand their clients have that they don’t expand much on top of z16. They are very power-efficient machines.
My only complaint is that there’s no desktop or desk side version. Using z/OS as a daily driver would be an interesting challenge.
I am not up to date on latest ones, but it used to be that depending on chosen density you could have "air" cooled meaning sealed liquid cooling with on-rack radiators, and denser units with heat exchanger for external cooling loop.
It fits the stereotype. Looks more fitting than the GCOS 7 and GCOS 8 boxes, which are only Xeon servers running Linux and the mainframes in VMs. We can also look at the HPE top of the line NonStop machines that way.
As I was being interviewed by an IBM branch manager in Chicago (my wife had started grad school at the University of Chicago), it was explained to me:
"Some people think that IBM is a technology company or a computer company. It's not. IBM is marketing company. IBM would be in the grocery business if they thought there was any money in it."
For a marketing company they invest way too much in developing fundamental technology. The number of (useful) patents and Nobel prizes they have is quite impressive.
The biggest problem with the mainframe conversation is the TCO paradox that it creates. For the average CTO, the prospect of paying IBM millions of dollars up front is absolutely a non-starter. They just can't get beyond this aspect. It doesn't matter if you promise them a genie with unlimited wishes at the end - and this isn't too far off from what you actually get in some cases. The initial sticker shock is simply a bridge too far.
The savings you see only manifest after years of not doing the other things. This kind of savings is usually invisible to the business leaders. You have to make a really big leap of faith and have a lot of good mentors and leaders around you to execute on this kind of architecture. It is also essential to lead the technology people along the path of best practices. I've seen a large corporation justify moving away from mainframe after allowing its employees to load applications on it that are wildly unsuitable for that kind of compute platform. Think things like SalesForce and GitHub Enterprise - "See, it runs like ass and IBM is billing us like crazy! - we need to get off mainframe".
reply