Specifically it's trained to, when given some text from the internet (presumed t...

Specifically it's trained to, when given some text from the internet (presumed to mostly have been written by a human), predict what comes next (there's a lot of details of exactly how you express this mathematically, but that's the basic summary). This is the objective function it is optimised to maximise while it is being trained, how close did the prediction match the actual text which came next.

This is done mostly because it's very easy to get a huge amount of data and score its performance numerically on that data without any manual process of deciding what the correct answer is. It turns out given enough data and a large enough network, it becomes very good at it, even to human eyes.