O(N + M) is not equivalent to O(N) if you make no assumptions about the relative growths of N and M, that's why it's actually meaningful to keep both terms around. You can only reduce it to O(N) when N dominates M (M = O(N)).
In the case of file size inputs to a diff algorithm, it would not normally make sense to discuss how one file is going to be "exponentially" larger than another or something. They just are what they are, the bare inputs to the function in question.
By contrast, graphs can have significantly different relationships between nodes and edges, for instance, with sparse vs. dense graphs.
One could construct a situation in which file size inputs are varying related to each other in some O(...)-significant way, but it would be very strained, especially in the context of a diff algorithm (why are you trying to diff things significantly different in size at all?). You can fiddle with almost any O(...) pronouncement that way, e.g., "adding two integers isn't O(1)". True enough, but generally, more detail then we are looking for.
You’re wrong, the sizes of both input files are commonly considered when analysing algorithms that work on two files. Consider a problem that could be solved by sorting one of the files and then stepping through both files item by item (not saying that any meaningful problem with this property exists, it’s just an example). Then you would have a complexity of O(n log n + m) assuming n is the size of the file being sorted.
There is valuable information in running time being linear in the size of both files. You could define n as the combined size if you wanted to, but that could be confusing.