Hacker News new | past | comments | ask | show | jobs | submit login

Because arbitrary bytes cannot be interpreted as UTF-8. I guess this kind of thing is tolerated by Go users because anyone who values a proper type system uses a language with generics.



How do you fix a file that has errors in it if the standard library of the language you're using won't even let you read it?


If you're fixing bytes then you load bytes and fix them.

You won't, though, fix bytes by loading characters and then trying … to fix the bytes … the characters encode to. Just doesn't make sense.

We were able to get away with stuff for a long time because bytes were characters and characters were bytes and we could think sloppily and not break anything. But with Unicode they really are different things, and we need to be tidier in our thinking.


Seems like you're just reasserting it doesn't make sense, without giving a reason. But it does make sense in Go.


> But it does make sense in Go.

No, Go doesn't work that way. You asked, 'How do you fix a file that has errors in it if the standard library of the language you're using won't even let you read it?' In Go, you don't read file as strings, but rather as bytes (proof: https://golang.org/pkg/os/#Open, which returns a File which implements Read: https://golang.org/pkg/os/#File.Read).

You would do the same thing in Python: open the file in binary mode, and the iterate over the bytes it yields.

Now, the one thing that would be annoying in Go is fixing a broken filename. I'd have to think a bit to figure that out.


You can cast between byte arrays and strings in Go. The difference is that strings are immutable (so it does a copy).


> You can cast between byte arrays and strings in Go.

Yes, you can. But, in the specific case you mentioned, no competent programmer would cast the bytes of an invalidly-encoded file to a string, then iterate through the runes of the string. That wouldn't even begin to make sense!

I really don't understand what you're trying to argue here.


Although it only works for smallish files, that seems fairly useful for getting as much info as you can out of a corrupt but mostly UTF-8 file?

Any runes that aren't valid will come back as the replacement character. And you can count newlines and print the location of the error(s). You also have the index of the error.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: