Good read. I don't think you were controversial :-)
Spotted a wee typo about 1/2 way down:
>> LRU works by evicting less commonly used data in preference of more frequently used data
For this question:
>> Does anyone know of recognized tools which solve this problem?
BMC's Control-M product manages this fairly easily, although it is easy to let the workflow become unweildy with that product in my experience. AutoSys fairs a little better for this use case.
Open source wise I guess you could use PBS or something of that ilk to replicate. I think though an ideal architecture for this problem wouldn't be what's currently available.
I think a hot-hot message queue with deduplication would be a better approach. You can afford then to have multiple hosts submit an appropriately named job and the first node on the other side of the queue to successfully lease the message wins the right to run the task contained within. If it fails to complete the next node leases the task.
It would require some consideration about ensuring integrity of the message and authentication requirements for publishers.
Nicely summarized on the network layer, next you'll want to expand the 'database' box into its components and a storage layer and its components.
There is also an interesting layer of networking services which involve routability and validation (certificate checking etc) and then there is the third party API scale so sometimes you're generating traffic back out to things other than a CDN (like Twitter or Facebook or some Google thing)
Part 4 should be looking at it from the data center side, which is these things are breaking all the time, building scalable repair systems that give 100% uptime on unreliable hardware.
Spotted a wee typo about 1/2 way down:
>> LRU works by evicting less commonly used data in preference of more frequently used data
For this question:
>> Does anyone know of recognized tools which solve this problem?
BMC's Control-M product manages this fairly easily, although it is easy to let the workflow become unweildy with that product in my experience. AutoSys fairs a little better for this use case.
Open source wise I guess you could use PBS or something of that ilk to replicate. I think though an ideal architecture for this problem wouldn't be what's currently available.
I think a hot-hot message queue with deduplication would be a better approach. You can afford then to have multiple hosts submit an appropriately named job and the first node on the other side of the queue to successfully lease the message wins the right to run the task contained within. If it fails to complete the next node leases the task.
It would require some consideration about ensuring integrity of the message and authentication requirements for publishers.