Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Apologies in advance, this went longer than I anticipated.

I'm in the midst of re-architecting a legacy system whose terribleness is legendary even in Hell.

Given that this system has a fairly small set of nouns as well as a limited set of verbs that can apply, I opted to try and abstract out each of these into their own services. This allows a rolling replace/upgrade cycle that allows the old cruft to continue running while limiting the scope of efforts to something less than the Aegean Stables.

One characteristic of the beshitted legacy system is all manner of action-at-a-distance and an embarrassing lack of code reuse so even minor changes in business logic involves a nightmare of grepping and hoping you found all the areas that need to be changed to either support that logic or implement it. To that end I opted to have a message broker for create/update/delete operations that was responsible for distributing such events to other services as business logic dictated. Internally we nicknamed it "Sorta SOA".

As an example work flow, a user is created and publishes the event to the broker. It then can either treat response as a pubsub, acknowledged-level message, or an RPC style message. Broker gets message and generates a global transaction ID that can be used to trace all further emitted messages as well as handling final response to the originating service. It ack-responds to the originator with the ID. It then has a logic chain based on the name of the event message and can make calls to other services such as the communication service that may email or SMS message someone. Comm service acks upon successful receipt of message, then on delivery responds with any resultset. Broker then receives that result set and checks to see if it has completed all tasks for that transaction. If so, it responds with the results of all tasks (or a defined response using a subset of data from results) to the originator.

All services are idempotent and communication is via RabbitMQ to support fabric changes and persistence/guarantees of delivery.

Services themselves are RESTful HTTP API for manipulation of the specific nouns they are in charge of. It's allowed us to separate concerns to a surprising degree and formalize business logic in a single area (the broker) for event-driven behaviors. It also gives us the flexibility to interface with third party services in a manner that was impossible given the previous disastrous code.



I spend my days dealing with such a system, action-at-a-distance, grepping and hoping you found everything related, and to which I'll add: every variable declared global, so that whenever you see a part of the program which doesn't appear to initialize the variable, it might be (a) a bug, or (b) a dependency upon a human operator process that has hopefully caused a different part of the program to have already placed a value there. In the presence of multiple users it can be a kind of slow-motion race condition.

"The beshitted legacy system" is a more eloquent turn of phrase for it than anything I've yet conjured up, so I salute you.


Oh yes, we have globals on an, erm, "global" basis. As a bonus, this codebase was originally written to be entirely flat-file driven and later database bolt-ons for large portions were merely backups for those files. Of course, being dependent on flat files the code helpfully avoided any use of OS-provided locking and instead implemented a lockfile system that checked for existence of lockfile, if not exists then wrote that file and proceeded to modify the data file (and never appended, just rewrote). If the lockfile existed, it checked for existence 10,000 times and if the file was still there, it deleted it and then proceeded. Because counting to 10K takes a long time on a computer. And because nothing could actually create a lock between the time the code checked for existence vs creating its own. And because anything any other script is doing that takes more than counting from 1 to 10k is worth writing, so just overwrite. And that slow script would never overwrite your write. sigh

The most amusing part of that last bit was presenting concurrency tests that showed any concurrency > 1 insured corruption (because the C-level who chose the original contractor/employee refused to believe the actual logical argument of its faults) and still having to deal with arguments about how "it hasn't happened yet!" "Your house has never burnt down, yet you own insurance. And as an aside, I will note that it has happened on several occasions. You've just never had to actually solve it and your prior monkey kept you in the dark about it."


Would be interested in hearing your story - how did you get to re-architect "the beshitted legacy system"? Freelancer called in to rescue, or poor schmuck who got fed up ?

There is probably a lot of legacy like this out there...


The original system was purchased via contractor work and then maintained/expanded by an in-house employee who later managed two additional developers. That employee then left and the company decided it needed to find an experienced development manager, which is when I was hired.

The first two years were essentially triage efforts to at least introduce some modicum of dependability and scalability while trying to avoid wholesale rewrites. Eventually some C-level management changes along with a couple of years of my insistence that delusions to the contrary we did not in fact have any in-house design talent, the company decided to finally address the woeful usability of their product and hired a usability/design firm to create a new front-end. Given that most of the original code entwined presentation, models, views, and sewage plumbing completely this was the opportunity to re-architect with the caveat that we wanted to limit the scope of that effort to purely customer-facing areas, because the homebrewed in-house "CRM" was at least as craptastic as the customer side of things but was a much larger codebase and dealing with a ground-up change was a recipe for disaster.

So in effect we have two systems running in parallel, with rational database design and separation of concerns on the customer side along with duplication of data into the older database to minimize impact on the internal stuff. The efforts for rolling replacement of the legacy system will then continue, removing a noun and associated verbs one at a time. Basically performing the old Ship of Theseus trick, except at the end a wooden row boat will end up a powered steel-hulled yacht.


Can I suggest you agree with CxO you can use relatively anonymised blog posts, build an audience around "legacy Ships of Theseus" and move to consulting when finished.

Not sure how you find "people who have terrible codebases and don't know what to do" audience d but you have the writing skills for it


Heh, my employment agreement gives me a lot of freedom to do "whatever" since I negotiated that initially due to some ongoing contracting support I needed to provide.

I've always viewed the blog->consulting path as sort of an underwear gnomes problem. Step 1: Blog about X Step 2: ??? Step 3: PROFIT!(consult!)

The time and knowledge necessary to successfully build a web audience are non-trivial when you lack a pre-existing public persona or a measurable marketing budget.


This is true, but till I get a better idea it's my go to approach. And you do have a wordsmith style so it may be easier for you than most




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: