I deleted two previous comments because I realized I misunderstood your proposal.
I understand it better now, but I am still confused about something...
Your state machine would need at least 2*160,000 states (you need an extra bit to flag whether you have reached a newline in the last word and need to increment a counter or not), correct? And you are assuming the input is 4 bytes, so won't your transition table need (2^32)*2*160,000 ~ 10^15 entries (at least 3 bytes each)?
The states don't need to map 1:1 with cities or temperatures. They merely need to encode all information collected so far which is still relevant. They also don't need to represent all possible situations - anything that is super rare (eg. temperature of 95C) can simply be diverted to a special "invalid" state which triggers regular code to take over for those few entries.
Hmm, still doesn't seem feasible. Even if you only have 256 "relevant" states (which I think you'll agree is far less than what you need) then given a 32-bit input your state transition table is 2^32*256 = 1 Terabyte.
You could shrink your input size to 2 bytes but then you can't work on a word at a time, and for a realistic number of relevant states your transition table is still way bigger than you can fit in even L3 cache.
Unless I am missing something very basic, this doesn't seem like a viable approach.
Your state machine would need at least 2*160,000 states (you need an extra bit to flag whether you have reached a newline in the last word and need to increment a counter or not), correct? And you are assuming the input is 4 bytes, so won't your transition table need (2^32)*2*160,000 ~ 10^15 entries (at least 3 bytes each)?