I'm a systems engineer and a software engineer. A couple tips that are relevant to this story:
1. Applications do not need DNS servers, ever. Hosting platforms do. The way I read this is that it's some odd form of split horizon where one DNS server seems to replicate to larger DNS servers, yet the single server is what carries most of the critical information.
2. State should be transferred from where you gather input to where it becomes actionable by an eventually consistent process. ie: If you store your customer CNAMEs in ERP, then something like webhooks to a service that maintains the state in machine readable form should verify, accept/deny, and do something with that change (like update DNS).
3. Customer systems and core/critical systems should be separate. The fact that critical contact systems for employees were not separate from
4. Have a DR plan and artificially enact it at cadence, doing so even in a sterile environment is better than nothing. Successful DR's in production are best. If the first time you live test your DR strategy is when all is lost, then all is most likely lost and at best you have a lot of hours of work ahead of you.
5. The holy grail is replayable audit frameworks. This could mean capturing diffs but it can be as simple as logging the current state of a given $thing and sending it to an off-local system.
1. Applications do not need DNS servers, ever. Hosting platforms do. The way I read this is that it's some odd form of split horizon where one DNS server seems to replicate to larger DNS servers, yet the single server is what carries most of the critical information.
2. State should be transferred from where you gather input to where it becomes actionable by an eventually consistent process. ie: If you store your customer CNAMEs in ERP, then something like webhooks to a service that maintains the state in machine readable form should verify, accept/deny, and do something with that change (like update DNS).
3. Customer systems and core/critical systems should be separate. The fact that critical contact systems for employees were not separate from
4. Have a DR plan and artificially enact it at cadence, doing so even in a sterile environment is better than nothing. Successful DR's in production are best. If the first time you live test your DR strategy is when all is lost, then all is most likely lost and at best you have a lot of hours of work ahead of you.
5. The holy grail is replayable audit frameworks. This could mean capturing diffs but it can be as simple as logging the current state of a given $thing and sending it to an off-local system.