[1] https://www.youtube.com/watch?v=m6DnmVqoXvw
Large-scale cluster management at Google with Borg https://research.google/pubs/pub43438/
Omega: flexible, scalable schedulers for large compute clusters https://research.google/pubs/pub41684/
The Chubby lock service for loosely-coupled distributed systems https://disco.ethz.ch/courses/hs08/seminar/papers/osdi06-goo...
Apache Mesos https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Apac...
mentioned in the nomad repo and docs site
Sparrow: Distributed, Low Latency Scheduling https://cs.stanford.edu/~matei/papers/2013/sosp_sparrow.pdf
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/...
Raft: In search of an Understandable Consensus Algorithm https://raft.github.io/raft.pdf
----
Finally, take a look at the papers referenced on semanticscholar
https://www.semanticscholar.org/search?q=nomad%20orchestrati...
[1] https://www.youtube.com/watch?v=m6DnmVqoXvw