Hacker News new | past | comments | ask | show | jobs | submit login

Up until Swift 5, Swift's String type was backed by UTF-16 (except for all-ASCII native strings, which just stored ASCII). Even with Swift 5, it's sometimes backed by UTF-16 (namely, when it contains an NSString bridged from Obj-C code that contains non-ASCII characters, which can happen even in pure-Swift code due to all of the String APIs that are really just wrappers around Obj-C Foundation APIs) and sometimes backed by UTF-8.

In truly performance-sensitive code with Swift 5 I will go ahead and use the UTF-8 view with the assumption that input strings are backed by UTF-8, and even force it to native UTF-8 if I'm doing enough processing that the potential string copy is outweighed by the savings during processing, but that's something that's only worth dealing with if there's a clear benefit to doing so. In most cases it's simpler just to use the unicode scalar view, as that doesn't have the potential for having to map UTF-8 sub-scalar offsets into a UTF-16 backing store (whereas unicode scalar offsets always lie on both UTF-8 and UTF-16 code unit boundaries).

All that said, I would have been much happier if Swift could have been 100% UTF-8 from the get-go, which would drastically simplify a lot of this stuff. But the requirement for bridging to/from NSString makes that untenable as it would otherwise involve a lot of string copying every time you cross the Swift/Obj-C boundary.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: