Hold up. In my physics classes, no one ever gave me a straight answer on what constituted a "macrostate". It always sounded arbitrary for similar reasons to the ones you describe for language. Are you telling me it's literally defined by the energy of the system (the Hamiltonian, right?) alone?
The degeneracy of a macrostate is just a matter of the Hamiltonian but what counts as a macrostate is, in a sense, arbitrary.
It's really just a state you, the observer, can distinguish. This would typically involve things like pressure, volume, and temperature but if you developed a new way of measuring the properties of a system suddenly the possible macrostates multiply in number, each contains fewer microstates, and the entropy of the state decreases. Take this far enough and you could create a Maxwell's Demon to extract energy from thermal motion. But while it's subjective in some sense it was later shown that our subjective knowledge of the world is limited by the laws of physics in other ways and perfect subjective knowledge is impossible.
So you could say that entropy is a measure of your ignorance about the exact state of the world, which corresponds nicely to the information theory definition. It's just that in physics everyone is in practice going to be using the same pressure, temperature, and volume measurements while in information theory what constitutes a macrostate is very fuzzy.
A macrostate is any particular probability distribution of microstates. Usually these are picked to reproduce some macroscopic observable, such as temperature, pressure, volume, magnetisation, etc.
Not sure about physics terms, but in combinatorics I believe that an example of macrostate (also called an 'event') would be "2 coins out of 3 landed heads" and the possible microstates (also called 'outcomes') would be THH, HTH, or HHT.