Sometimes discrete graph theory problems can be relaxed into a continuous space where gradient descent can be used. If you want a concrete example of how this happens, take a look at the "Continuous Optimization" section in the "Thirty Years of Graph Matching in Pattern Recognition" paper (graph/subgraph isomorphism).
Is there a canonical reference you'd recommend to understand this problem or a paper I could look at that was similar to this implementation?