It’s true that pixels aren’t most accurately modeled as squares, but they should still be centered at (0.5, 0.5), because you want the center of mass of a W×H pixel image to be at exactly (W/2, H/2) no matter what shape the pixels are. Otherwise it shifts around when you resize the image—perhaps even by much more than 1 pixel if you resize it by a large factor.
Unfortunately doesn't make sense when you need to look up pixel 0.5,0.5 in the framebuffer.
When dealing with cameras, the central point us rarely h/2,w/2. So you're really dealing with two sets of coordinates, camera coordinates and sensor coordinates, that need to be converted between.
Integer coordinates are convenient for accessing the sensor pixels, and the camera-to-sensor space transform should theoretically include for the 0.5,0.5 offset. However, getting a calibration within 0.5 pixels accuracy is going to be hard to begin with.
Nobody’s suggesting that pixels are stored at half-integer memory addresses. After all, only a small subset of the continuous image space will lie exactly on the grid of pixel centers—and this is true no matter how the grid is offset. The point is that the grid should be considered as being lined up with (0.5, 0.5) rather than with (0, 0).
So, for example, if you’re scaling an image up by 10× with bilinear interpolation, and you need to figure out what to store at address (7, 23) in the output framebuffer, you should convert that to continuous coordinates (7.5, 23.5), scale these continuous coordinates down to (0.75, 2.35), and use that to take the appropriate weighted average of the surrounding input pixels centered at (0.5, 1.5), (1.5, 1.5), (0.5, 2.5), and (1.5, 2.5), which are located at address (0, 1), (1, 1), (0, 2), and (1, 2) in the input framebuffer. The result will be different and visually more correct than if you had done the computation without taking the (0.5, 0.5) offset into account. In this case the naive computation would instead give you a combination of the pixels at (0, 2), (1, 2), (0, 3), and (1, 3) in the input framebuffer, and the result would appear to be shifted by a subpixel offset. This was essentially the cause of a GIMP bug that I reported in 2009: https://bugzilla.gnome.org/show_bug.cgi?id=592628.