What a camera does is a translative operation of something already existing in front of the operator, it cannot conjure things on its own (whether original or derivative) nor scale due to the aforementioned limitation. It cannot understand instructions that are beyond the processing of whatever it is pointed at (leaving the rampant post-processing becoming the norm in handheld devices aside) and act upon them to significantly change the content of its output (whose object light last reflected from and how). Isn't this a very reductionist analogy? I genuinely do not understand its purpose.
All analogies are reductionist along some axis, in this example, an AI will just sit there unless pushed by the human to do something.
Also, after the AI shutter button is pushed, there are a lot of human decisions made to refine the output. In the SD realm, people are sorting through hundreds of image outputs to find the gems amongst the nonsense.
Which is kind of amazing considering that film can still look better, and requires a rudimentary camera. Literally a beer can with a pinhole, and film inside can take a picture.
This is the same with ChatGPT.