Funny enough, I'm using a combination of small shell scrips that are executed within an internal application depending on the status of the the machine. So I guess I'm in the "why not use both" position.
Currently, I have automated system set up where basically a cluster of vm's in nodes of two communicate with each other to pass over where the other server in a node left off in its operation (or dies from timeouts) running nginx/gunicorn/django.
And those servers on each node are monitored by another server (running apache/php/mysql) that checks the progress/status of the operations and may send requests (reinitialized a node that stopped running) to the nodes where those bash scripts (concatenating files, finding a specific place in one of the files to help reinitialize a process in a node) are executed and piped through back to the monitoring server.
It is way more complex now, but surprisingly, I don't have troubleshoot not even close to as much as before since I automated that.
Currently, I have automated system set up where basically a cluster of vm's in nodes of two communicate with each other to pass over where the other server in a node left off in its operation (or dies from timeouts) running nginx/gunicorn/django.
And those servers on each node are monitored by another server (running apache/php/mysql) that checks the progress/status of the operations and may send requests (reinitialized a node that stopped running) to the nodes where those bash scripts (concatenating files, finding a specific place in one of the files to help reinitialize a process in a node) are executed and piped through back to the monitoring server.
It is way more complex now, but surprisingly, I don't have troubleshoot not even close to as much as before since I automated that.