I lead a team on a large data project at an enormous bank, hundreds of devs on the project across 3 continents. My team took care of the integration and automation of the sdlc process. We moved from several generations of ETL applications (9 applications) netezza/teradata/mainframes/hive map reduce all to spark. The project was a huge cost savings and great success. Massive risk reduction by getting these systems all under 1 roof. We found a lot of issues with the original data. We automated the lineage generation, data quality, data integrity, etc. We developed a frame work that made everything batteries included. Transformations were done as linear set of SQL steps or a DAG of sql steps if required. You could do more complicated things in reusable plugins if needed. We had a rock solid old school scheduler application also. We had thousands of these jobs. We had an automated data comparison tool that cataloged old data and ran the original code vs the new code on the same set of data. I don't think it's impossible to pull off but it was a hard project for sure. Grew my career a ton.
I know startups that hired data engineers, deployed warehouses,DBT, a BI tool and churned hundreds of reports, and in one case their DBT project has hundreds of files. No one in that company knew why any of it was used.
All said and done the business users wanted three reports.
More often than not data teams are self-serving than anything else.
I think the difference is that technical and business leadership at a bank understand that data is lifeblood. Bad data will get you on the front page of WSJ and a phone call from a regulator in Luxembourg.
For a lot of smaller Internet companies, data is just a fluffer. The real business is in image and which VC bbq you get invited to.
Can you define fluffer as you use it here, and maybe mention where you picked it up? I haven't heard it used much outside of a specific and very notorious Sankey diagram.
Not the person you’re replying to, but I would expect that a near universal answer to this across all kinds of projects (not just software) is effective collaboration and communication between stakeholders and teams.
Despite no shortage of technical talent on large projects they can still often fail, and it’s because building a technically impressive thing doesn’t matter if it doesn’t do what business needs.
So it’s about making sure you’re building the “right” thing that delivers on business’s actual needs, and the only way to find out what those are is through constant and ongoing good communication between technical and business people.
Downside is lots of work business is doing is running around with wheelbarrows and they actively sabotage it when someone wants to build a conveyor belt.
The flip side of this is that the stakeholder has to actually care enough to invest in collaboration and have enough bandwidth to be able to follow through.
The kind of communication that lets cross-functional projects be effective is time consuming, and competent people tend to be overworked, no matter what part of the business they’re in.
Specifically for the financial sector and especially banks and government tax departments, they’re on a clock.
As time moves on, there are less COBOL engineers. Hell, sometimes their systems have been written in a bespoke language. There is less and less understanding of why something is set up the way it is due to lack of documentation. Updates / changes to the code sometimes have to wait for 2-3 years because the system isn’t flexible enough (literally, not as in “this change will take 2-3 dev years”). Even code that old contains bugs, but due to the age of the code they’re inscrutable.
However, whichever new system gets tooled up has to be 99.999% flawless, or it could cause serious damage to the bank and even its regional market.
When there is that kind of pressure, dev teams are no longer considered a cost sink, money flows, and the world is possible.
A large project where the end goal is replicating (and possibly correcting) existing data outputs is much more likely to succeed than one that is integrating new data sources or building new data models. For the latter type of project, it's very common to find that the team is disconnected from the business users and original motivation for the project, with poorly defined success criteria.
There was a clear, large cost savings and risk improvements with the project. The project was actually easy on the requirements front. They put all new non critical features on hold for 2 years and there was no question on the requirements: The new system's data must match the old system's data except for any bug fixes or agreed upon changes.