Not that I don't appreciate the subject matter, I wave the flag for K8S all the time and I've shipped bank product presentation stuff on it... But...
The fundamental challenges of building a bank are almost entirely orthogonal to things like distributed system uptime and resiliency (unless, I suppose, you could lose consistency during the types service loss Kubernetes makes easy to ameliorate). Evidence for this abounds: nearly every major bank out there is at least 10 years behind the tech we're discussing here. Sure, banks are getting savvy to modern techniques for their presentation and API layers (my employer, Capital One, for example).
But the actual challenges are data consistency and liveliness and audit-ability are preserved. I'm really curious if you're using a novel technique to achieve this that Docker and self-managed micro-app swarms can deliver on better.
Because what really limits most financial institutions from embracing a lot more modern tech is their core systems of record AND the acceptance of said systems by their governing agencies.
So when we talk about basic datacenter ops, that's all great. But I don't think they're the things that make K8S great for a bank. I think most people in a position to evaluate a financial institution roadmap would be unmoved by this deck.
Now, if you talk about other things we both know K8S is great for... Things like discovering the genesis of a database action by preserving it throughout the chain of responders, or being able to rapidly respond to site exploits with rolling restarts and a built in mechanism for feature flagging, or having a really great way to offer data scientists the environments they need without risk for data theft, or being able to use traditional CI/CD methodologies but end up with a single deployable unit that is amenable to both automated an manual review and mechanical deployment in spite of the tooling used within.
Not that I think selling K8S is your job. But I thought I'd mention the perspective of someone doing inf modernization and product work at a major bank.
And of course, as always: the opinions above are my own and not those of my employer or co-workers.
>what really limits most financial institutions from embracing a lot more modern tech is their core systems of record AND the acceptance of said systems by their governing agencies.
I just want to say that I think there is a huge amount of FUD about how you can and cannot build your technology as a regulated entity – and in particular as a bank. In reality, close to 100% of requirements from a regulator will tell you _what_ you must build, not _how_ you must build it. Even then, especially in terms of resilience and security, they are almost always a subset of our own requirements.
What can be more of a challenge is convincing an auditor that what you have done is acceptable, since it can be so different from what they may have seen before. Again, I don't think this is a reason to compromise. We see technology as a major competitive advantage, so it is worth the effort to find open-minded auditors, and spend time to explain and demonstrate how (and why) our software meets the requirements.
I don't think there's any way we could build a secure, resilient bank with the kind of product experience we want, AND do it on the budget of a startup if we approach technology through the same lens as existing banks.
> I just want to say that I think there is a huge amount of FUD about how you can and cannot build your technology as a regulated entity – and in particular as a bank. In reality, close to 100% of requirements from a regulator will tell you _what_ you must build, not _how_ you must build it. Even then, especially in terms of resilience and security, they are almost always a subset of our own requirements.
I'm not sure what your involvement here is, and mine is limited (thankfully) to a substantial distance. However, I think there is a rather big difference between different FI's experience; because it's a true game of politics.
When investigating if Level Money should be a bank (and learning that almost no one wants to be a bank, it's very hard to make money just being a direct-to-consumer deposit bank unless you're quite scaled), I basically had a surprisingly credulous audience because the CFPB was basically willing to doorbuster anything that they thought would spur the big banks to action. It was... very surprising to ultimately decide that it was impossible for financial reasons to actually succeed at being a bank.
But they will insist on things that every bank should have, like a credible way for analytics to run in an environment where the raw data is not subject to exfiltration by a compromised data scientist's machines without an audit trail a mile wide.
We're _very_ directly involved with regulation, and with regulators :-) It's a very important part of our business, so we put a lot of effort into ensuring that we please regulators _and_ that we run our business in the way we want.
Just this morning I read an article in Bloomberg news magazine about UK regulators being avant-garde and promoting competition in banking by allowing branchless/app based banking. There was reference to Monzo as well as an example of new age bank.
> I just want to say that I think there is a huge amount of FUD about how you can and cannot build your technology as a regulated entity – and in particular as a bank. In reality, close to 100% of requirements from a regulator will tell you _what_ you must build, not _how_ you must build it. Even then, especially in terms of resilience and security, they are almost always a subset of our own requirements.
Having worked with banks and insurers solely in Java and connecting to legacy (mostly Cobol) before, I was surprised, in my current position, to see some companies doing their complete banking back end in PHP & MySQL. I knew the how is not part of the regulations, but I did expect the CTOs to pick the 'no one ever got fired for choosing' choice.
It seems common knowledge these days among the slightly but not too technically inclined that any new major project should use a LAMP stack as its base.
People see Facebook, Amazon, and many others running PHP & MySQL on Linux at scale and they know it works reliably, so while it may not have the support of Cisco or Oracle, it is pretty close on the 'no one ever got fired for choosing' X scale, since you can point to every other major company using these building blocks reliably if your investors, CEO, board or auditors asked why you chose to use PHP & MySQL.
In summary, PHP & MySQL have become the modern equivalent of a "safe" choice for your stack to be built on. Its not necessarily a bad choice either, you get access to a large community of skilled people who can write PHP & write SQL statements, and while everyone likes to hate on PHP, it isn't about to up and disappear any time in the next decade either (unlike COBOL).
I'd say the reliability the bank is looking for is much higher than what's acceptable for web companies (i.e. web companies are fine with eventual consistency, which would is obviously unacceptable in a banking system outside of trivial non-core features).
I'd also suggest that it's not just scale; the kind of reliability Facebook needs is fundamentally different than what a bank needs. Broadly speaking, Facebook needs the site to keep working as well as possible even if some subservice fails, and a bank needs a subservice not to fail. I'm summarizing here and I know it; clearly neither of them is actually on the absolute extreme end, as Facebook needs authentication to work and a bank may not care if the interest rate display widget on their customer banking app fails to load a couple of times. But I'd still suggest there's enough difference between the requirements to be a fundamentally different domain.
Even in "the cloud" things differ between services. A social media app has very different reliability requirements than a backup cloud.
Well actually there are many sub services in a bank that can go down without major impacts. The two major banks I use have weekly planned outages of features like old statement retrieval, person to person payments, ACH transfers, etc. Basically everything in the web interface could experience outages without any major crisis.
As long as ATM requests always work, nobody really seems to care.
"Broadly speaking, Facebook needs the site to keep working as well as possible even if some subservice fails, and a bank needs a subservice not to fail."
One of the reasons many of them stick with mainframes, AS/400's, and NonStop systems for backends. ;)
Why would eventual consistency be unacceptable in a banking system? In my experience people interact with social media on far shorter time scales than their banks.
When they post a new Instagram photo, they expect that their friends will see it basically instantaneously.
In comparison, when people use their debit card at CVS, they're not expecting anyone to log into their bank account seconds later and see the charge show up.
I would think correctness is more important than speed in a retail consumer bank.
Or do I misunderstand what you mean by eventual consistency?
If your data is only eventually consistent, then DB node A can have your bank balance at $x for some time still, while it is already $0 at on node B. Then, if some operation (say withdrawal) is checking the balance with node A, then you have a problem.
Yes, this is true of eventually consistent systems. The question is a) what does "eventually" mean (replication takes seconds, minutes, or hours?), b) what time delta do you expect for most transaction requests, and c) what is the risk of being temporarily wrong?
Seems to me that a bank could answer these questions as well as any other business, and build a system that works within the answers.
You're actually somewhat right! ATMs are an (sometimes) an example of eventual consistency. If an ATM is offline, it'll often allow you to make a withdraw anyways and once it's back on the network report back. That could mean an overdraft for you. Caveat here is that these are often low-traffic ATMs on the periphery, ones in the city are usually making calls home to check balances.
However, the buck (no pun intended) has to stop somewhere. Overdraft limits have to be consistently applied. Even that is somewhat up in the air. Take this with a grain of salt as it's second-party information, but my wife works in fraud prevention at a smaller credit union. She says that transactions are collected throughout the day and overdrafts are only applied at the end of the day to allow for bills to drain your account beyond its capacity and then payroll to land without applying overdrafts unless you're in the red afterwards. In some sense, that's even "eventual consistency" on the scale of 24 hours.
The most important thing in banks is that at the end of the day, the balance sheet, well, balances. And they limit their liability by preventing too much overdraft and applying daily limits to ATM withdrawals. I pose that general eventual consistency fits that pretty well, as long as "eventual" isn't "hours" for the most part.
A little more on eventual consistency in general as I understand it, eventual consistency systems come in many forms. In a leader/follower setup (think MySQL w/ async replication), usually "important" calls are made to the leader in a consistent fashion and changes are asynchronously replicated to the followers for general read fanout. There are a lot of different kinds of systems with different guarantees. In a dynamo-style system, writes/reads are usually done to a quorum of replicas (e.x. 2/3 replicas), and only if the read from the two replicas disagree are the values on all three replicas "repaired" via last-write-wins. Facebook has a model they call causal consistency[1] which models causal relationships (e.x. B depends on A, therefore B isn't visible until A is also replicated).
You can consider any system with a queue or log in it that doesn't provide some token to check for operation completion to be eventual. For example, imagine you fronted DB writes with Kafka. Lag between writing to Kafka and commit into the DB may only be 100ms, but that's "eventual". However, if you provided back a "FYI, your write is offset 1234 on partition 5", you could use that as a part of a read pipeline that checked that the DB writer was beyond offset 1234 on partition 5 before allowing the read to proceed. That'd be consistent.
That part is surprisingly easy if you architect it right. The core abstraction most banks use is your "available balance" and the fact that they can reconcile on a longer time period than seconds.
PHP in banking? That's scary. Pretending to be Facebook and being able to handle PHP development looks like a bad judgment call from where I'm standing.
PHP seems like it is made for everything banks are not. I have the impression PHP is hard to do securely, invites bugs but let you get going quickly. This sounds like Lockheed deciding to use C++ for the F35.
I work for a huge bank and the system of record thing is spot-on. I don't think there's any law that says we must operate systems of record in certain ways or keep records of every little thing but remember that this is banking. Keeping records of every little thing is practically the religion of banking!
So for a long time now the big banks have been trying (and often failing) to keep track--centrally--of every little server or device that pops up on their networks. So there was a big push recently (few years ago) at the big banks to improve their systems of record. The goal being mostly related to better financial (asset) tracking. So they can figure out which internal teams were using the most/least resources as well as figure out who's not upgrading their stuff on a regular basis (technical debt builders).
So they spent all this money improving their systems of record and in walks Docker. It practically turns the entire concept of having a central place to track "systems" on its head!
To give you an example of the difficulties: We have loads of policies that say things like, "all systems/applications must be registered in <system of record>." Sounds simple enough: Just make sure that wherever a Docker container comes up we create a new record in the system of record and remove it when it comes down.
Except it's not that simple for many reasons the most obviously problematic of which is that the "system of record" works in batch. As in, you submit your request to add a new record and then maybe 8 hours later it'll show up.
Did I mention that there's also policies that say you can't put any system into production until it shows up in the system of record? =)
That's just scratching the surface though. Because the system of record at most financial institutions doesn't just allow you to delete records. Once you create one it is there forever. It merely gets marked as "retired" (or similar) and most banks require phases as well. For example, before a production system can be marked as retired it must first go through a mandatory, "waning" period (what it's called depends on the bank) that can often be weeks.
I can go on and on about all the zillions of ways in which systems of record (and the policies that go with them) are anathema to usage of Docker but I think everyone reading should "get the picture" at this point. If not, just imagine the havoc that would entail when you have thousands of Docker containers coming up and down every second. Or the entire concept of a container only being up for a few seconds to perform a single batch operation (banks love batch, remember!).
If you think the system of record requirements make adoption of Docker difficult you should know that the security policies are worse! Imagine a policy that states that all systems must undergo a (excruciatingly slow) security scan before being put into production. That's just one of the headaches, sigh.
> what really limits most financial institutions from embracing a lot more modern tech is their core systems of record AND the acceptance of said systems by their governing agencies
I tend to think (from experience) that the reason banks don't embrace modern tech is inertia - they've been able to "milk the cow" of regulatory body sanctioned profits for so long that they're in a mindset of not wishing to introduce volatility into what has been a highly predictable revenue stream. They've been using garbage technology for so long and making money at the same time that there has been no institutional impetus to innovate and embrace new tech. This is now coming to bite them in the ass as regulatory bodies are allowing new players to market and therefore eroding some of the guaranteed profits banks enjoyed up until now.
I doubt most large capital markets institutions (on both the retail and IB sides) will be able to weather the current (and impending) storm of disruption and come out unscathed.
> I tend to think (from experience) that the reason banks don't embrace modern tech is inertia
A lot of inertia. A lot of new-hype-tech is unreliable undocumented shit not ready for production. A lot of new-tech-is-old-tech that's been done for many years but with a new name.
Whenever someone asks "why don't you use Docker for xxx?"
Reply with "When is the last time you had an issue with Docker? Tell me about it."
It's even better when you're on a site like Hacker News with stronger-than-average, technical people. You get to read endorsements of the latest and greatest with people asking why (insert failure of common action here) is happening... in the same thread. I'll just stick with a hardened, flexible configuration of classic architecture and components that work.
I'd say inertia is a large factor but also risk aversion. Bank systems have to be very available (or they attract large fines) and so there's a tendency to stick with things that are proven to be robust even when they have other problems.
The other major problem I've seen is that most banks don't think of themselves as technology companies, and treat IT as an overhead to be minimized, which is absolutely the wrong approach.
This tends to lead to things that save money in the short term (e.g. outsourcing deals) but could well cost money in the longer term, as they make replacing legacy systems harder.
This is nonsense. Not every system in banks needs to be highly available. That's just silly.
Like the print server on the 3rd floor needs an active standby! Haha.
No, just like any organization banks have "critical" systems and "everything else." Docker is mostly being "sold" as a means to replace and improve the non-critical stuff. Like that internal web app everyone uses to look up <whatever>. Or the system that generates daily reports on <whatever>.
Just like most organizations, banks have a few critical systems and everything else is less so (to varying degrees).
yes and the topic of discussion in this thread is ..... core banking systems.... which do have to be highly available. If the topic had been bank print servers and I had made my comment yours may have made more sense
My comment didn't say "banks print servers need to be highly available" anywhere.. at all..
>what really limits most financial institutions from embracing a lot more modern tech is their core systems of record AND the acceptance of said systems by their governing agencies.
Let me put this even more bluntly: if your bank's technology plan isn't 99.5% about dealing with government regulation and maintaining core record integrity and auditability, you are not a serious player in the space.
Matt Levine's comment always stick in my head: 'I say sometimes that the tech industry is about moving fast and breaking things, while "finance is an industry of moving fast, breaking things, being mired in years of litigation, paying 10-digit fines, and ruefully promising to move slower and break fewer things in the future."'
Man, it's like people think banks are special when it comes to IT. They're not!
You have lots of extra regulations for sure but most of them are about retention of financial records. If a system isn't processing/storing financial records or "privileged" information nobody gives a damn.
The "technology plan" is 99.5% about making or saving money. That remaining .5%? Yeah, that's compliance. Because that's all it costs. Unless you think central logging systems are going to take up some large percentage of a multi-billion dollar quarterly budget?
People love to complain about "the costs of regulation" but you know what? In finance it really doesn't amount much in terms of "how much we spend." How much "it holds back the market" is a different debate entirely.
Aside: Without those regulations we'd just repeat all the same financial disasters throughout history.
You know, it's strange: when I worked for an insurance IT dept. we were informed that strict adherence to an ITIL-certified release process was essential to keep us "within compliance with FSA regulations - we need to do this in order to continue trading".
From experience, said process cost waaaay more than 0.5% of the budget. Time and cost overruns, massive overhead in personnel and a drain on mental resources which should have been spent on actual release quality rather than an audit trail meant to convey "Certified" quality. All in all, I'd say 50% of the costs of IT delivery were spent in plodding through the checkpoints with much of the other half being consumed by the interest on 20 years of technical debt accrued as a result of those very same resources being misdirected in such regulatory endeavours.
I recognise than I'm far too cynical to see regulation as anything other than a shield against liability. It's simply too obstructive to contribute to actual quality improvement. On the plus side, it does keep about 50% of IT personnel in a job.
So I guess you can count me in with the lovers :-)
Actually all systems are fair game to the auditors. If an auditor wants to see something they get to see it 99% of the time. End of story.
They really don't care about systems that don't process financial information! They don't care about your dev or qa environments. They don't care about your DNS servers or your switches or much else for that matter.
Regulators are 100% laser-focused on financial information and transactions. They want to see ledgers and logs and they want to see evidence that your systems prevent tampering. That's it.
There's no financial regulators that actually audit IT stuff. We probably should have them but we don't. The closest is the FFIEC but they only publish non-binding guidelines.
If you think the PCI-DSS matters to banks you're mistaken. Every year we audit ourselves and put the results in a filing cabinet somewhere. We have no obligation to show it to anyone and no one would hold us accountable for failing to be PCI compliant anyway.
My department's annual budget was in the 100MM+ region -- and this doesn't count surge resourcing used to deal with capricious requests from the feds. I have been asked about my dev and qa environments (and how they are firewalled from production systems) repeatedly. And yes, I have been asked about network architecture too. Penalties for non-compliance came in the form of significant financial penalties. It only got worse once Dodd Frank hit. Once securities of any kind are involved, shit gets real fast.
As a Monzo customer I have to say that the product Monzo offers and the features that traditional banks want me to use are also somewhat orthogonal :-)
So long as they can satisfy the regulatory requirements, I'd rather they were using whatever else it took to build a great product.
The fundamental challenges of building a bank are almost entirely orthogonal to things like distributed system uptime and resiliency (unless, I suppose, you could lose consistency during the types service loss Kubernetes makes easy to ameliorate). Evidence for this abounds: nearly every major bank out there is at least 10 years behind the tech we're discussing here. Sure, banks are getting savvy to modern techniques for their presentation and API layers (my employer, Capital One, for example).
But the actual challenges are data consistency and liveliness and audit-ability are preserved. I'm really curious if you're using a novel technique to achieve this that Docker and self-managed micro-app swarms can deliver on better.
Because what really limits most financial institutions from embracing a lot more modern tech is their core systems of record AND the acceptance of said systems by their governing agencies.
So when we talk about basic datacenter ops, that's all great. But I don't think they're the things that make K8S great for a bank. I think most people in a position to evaluate a financial institution roadmap would be unmoved by this deck.
Now, if you talk about other things we both know K8S is great for... Things like discovering the genesis of a database action by preserving it throughout the chain of responders, or being able to rapidly respond to site exploits with rolling restarts and a built in mechanism for feature flagging, or having a really great way to offer data scientists the environments they need without risk for data theft, or being able to use traditional CI/CD methodologies but end up with a single deployable unit that is amenable to both automated an manual review and mechanical deployment in spite of the tooling used within.
Not that I think selling K8S is your job. But I thought I'd mention the perspective of someone doing inf modernization and product work at a major bank.
And of course, as always: the opinions above are my own and not those of my employer or co-workers.