It's definitely cute that the abstract was generated by the model, but I wouldn't give that too much weight because it's the definition of cherry-picking. In this case, you can pick your data (the contents of the paper) to match a desirable output from the model (the abstract).
I think it would be a lot of effort to keep changing the paper until you get the perfect abstract. It would be easier to train different models or do random sampling from the predicted distribution.
This is impressive. They trained it on 200k articles from arXiv, 130k from PubMed, and over 1 million each from Newsroom and BigPatent. They have comparisons of generated abstracts versus actual abstracts of some landmark NLP papers.
My only gripe is that I would have liked to see (maybe in an Appendix) examples on papers on completely different topics, say one in biology, one in math and one in physics. It would be difficult to pick good examples, sure. But it would significantly strengthen at least my impression of the transferability.
Playing the devil's advocate here: might I ask why? The paper seems pretty thorough w.r.t. the description of used corpora, models, and hyperparameters. They even point to the exact implementation of their evaluation scoring and include a few examples in the paper itself. Even if they put up a demo instance for the required infrastructure it would be dead as soon as it hit HN and, as research code goes, likely a security hazard to wherever it's hosted.
In my view there seems to be enough here to replicate and validate the claims yourself if you wanted to. With a basic level of trust in academic integrity I'm completely fine with this paper.
From how they set up the training, I think this is a nontrivial task. Also, from a casual read through, it looks like it is generally focused on Arxiv papers.
To their credit, the authors included the models used and the metrics they used to validate their model. They also have detailed notes on the architecture for training which, at a quick glance, doesn't look easy to replicate unless you can borrow some GPU's in the cloud.
It focused on arxiv because you need a large set of labelled data (i.e. long documents with summaries). There is not many datasets of that kind out there.