Hacker News new | past | comments | ask | show | jobs | submit login

Merely finding the start/end of each line will use more computation (in the inner loop) than the approach I outlined. Let alone converting the number from ascii to a float, or looking up the place name in a has table (oh, and the name is variable length, so you're gonna need to find how long the name is first).



I deleted two previous comments because I realized I misunderstood your proposal. I understand it better now, but I am still confused about something...

Your state machine would need at least 2*160,000 states (you need an extra bit to flag whether you have reached a newline in the last word and need to increment a counter or not), correct? And you are assuming the input is 4 bytes, so won't your transition table need (2^32)*2*160,000 ~ 10^15 entries (at least 3 bytes each)?


The states don't need to map 1:1 with cities or temperatures. They merely need to encode all information collected so far which is still relevant. They also don't need to represent all possible situations - anything that is super rare (eg. temperature of 95C) can simply be diverted to a special "invalid" state which triggers regular code to take over for those few entries.


Hmm, still doesn't seem feasible. Even if you only have 256 "relevant" states (which I think you'll agree is far less than what you need) then given a 32-bit input your state transition table is 2^32*256 = 1 Terabyte.

You could shrink your input size to 2 bytes but then you can't work on a word at a time, and for a realistic number of relevant states your transition table is still way bigger than you can fit in even L3 cache.

Unless I am missing something very basic, this doesn't seem like a viable approach.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: