In my opinion, and as a general rule. You can't really ever truncate a user facing string correctly because it depends on language specifics. Hence it is my suggestion not to truncate user facing strings - and in fact you may want to consider them as binary blobs. On the other hand, as a matter of practicability sometimes you may have to, but certainly avoid doing so before it hits the presentation layer, and know that it may not be language correct.
There are many strings that are not user facing, which you expect to be of a certain nature, e.g. ASCII based protocols, and therefore you know what to do with them.
So the multi byte situation and strcpy, or std::string, or any other "standard" string function isn't really relevant as it's some other libraries duty.
the task of truncation and otherwise formatting UI strings is the preserve of the rendering layer(s).
Yes, truncating a user facing string requires more consideration regardless of programming language. For example do you truncate at the grapheme, word, sentence, new line, paragraph, or something else? How do you indicate truncation to the user, with an ellipsis perhaps? If you use an ellipsis is it appended to the original string or drawn by the GUI toolkit?
Note that the Unicode grapheme cluster, word, sentence, and line break algorithms are locale specific. Now consider how often programmers casually truncate strings, even in high-level languages, without accounting for the locale.
I have been following Pijul loosely for a while, and I would strongly agree that it could do with some information on some of the possible practical advantages of the approach. I do have some Pijul repositories around, but have not used it with the necessary further to explain any of the advantages, however, one possible advantage to note is that the equivalent of cherry-picking as used in GIT et. al, which always generates a new commit hash per cherry-pick for the same content should not happen in the Pigul world as I understand it. i.e. A specific commit when "cherry-picked" should have identical hashes, which has several implications.
- Having not actually tested this in my own repos, take my insight with the appropriate grain of salt.
I haven't really ever worked in a company where they used straight waterfall in practice; it has usually been design and documentation up front with the main aspects discovered, followed by coding and updating some or all of the documents as you go in an itertive sort of manner re-designing as you go. Typically, the documents updated were screen shots and data formats. You can't call that agile perhpas, but I don't think you can call it waterfall, either.
I've seen this a number of times. People (management and team alike) have said to me things along the lines of: "We don't have documentation, we are agile." Which, of course is not what the agile manifesto says, but is how some people have taken it.
Other sayings are "The spec is in the tickets, there is no user manual."
"We have unit tests." and so on.
All this makes trouble - Especially when you consider in order to change a thing, you should aim to understand it's salient parts in the first place - but perhaps that is not agile or the scrum way according to some.
It happens all the time. It happened with "design patterns". It happens with frameworks, languages, and many other things. Human nature. It's quite hard to argue with someone who cites an "expert", because you are a relative nobody.
It happens remarkably easily. But like every other mistake, this is one you frequently have to make for yourself to understand why it's a mistake.
The trick is having enough introspective capability to identify when you have made a mistake, and not fall back on dogma to say "I followed The Path, so it can't be a mistake".
I don't agree entirely with what the article says because it can be a problem to have to sift though a lot of small functions in order to determine what is going on. Sometimes, I think it's easier to read a moderately sized, but "clean" function that you can read from top to bottom, rather than a spiders web of references to other functions which may contain the nuance or bug you are looking for. What he has said reminds me of what the Forth community necessarily espouse, though.
People tend to take the length guidelines far too seriously though. Instead it requires thought and experience. The main Principe I try to follow is make it readable, it doesn't matter how it's written or if you've decided to use a goto in there, so long as it is one of the more readable alternatives.
> There is a less chance to have a bug in a very small function than moderately sized, but "clean" function.
Maybe so. But you need several of the very small functions to do the same work as the moderately sized function, and those several very small functions have to interact with each other. So did the total chance of a bug go up or down? It's not clear, but I lean toward the one moderately-sized function being more likely to be bug-free.
I think realistically in many cases the buggedness will still favour many smaller functions - if for no other reason than they must all be named.
The higher level function is then easy to read. The lower level functions get a name that can be checked more easily against an implementation if needed.
So long as you do not compress the random data you add it will work. Compressing it will not help the situation at all since the compression ratio will be constant for the random data. The problem is you have to then seriously negate the effects of compression.
Consider compressing "silence" in a telephone call.. You can't compress it well if you also need to have the non silence elements be indistinguishable from it by adding random noise. You must add enough random noise to cover up any compression differential, otherwise statistical artefacts still persist. That amount of random noise will be up to the maximum compression ratio you can achieve.
After considering this problem in long detail in the past, I too favoured utf8 at the time.
I remember a project (circa 1999) I worked on which was a feature phone HTML 3.4 browser and email client (one of the first). The browser/ip stack handled only ascii/code page characters to begin with. To my surprise it was decided to encode text on the platform using utf-16. Thus the entire code base was converted to use 16 bit code points (UCS-2). On a resource constrained platform (~300k ram IFIRC), better, I think, would have been update the renderer and email client to understand utf8.
Nice as it might be to have the idea that utf16, or utf32 were a "character" it is as has been pointed out not the case, and when you look into language you can see how it never can be that simple.
Comments, readability plus typabiliy where one of the main reasons I recently chose YAML for a configuration file. It seems YAML is a bit unloved these days, perhaps because it is more difficult to parse fully.
YAML references also proved useful in my use case.
There are many strings that are not user facing, which you expect to be of a certain nature, e.g. ASCII based protocols, and therefore you know what to do with them.
So the multi byte situation and strcpy, or std::string, or any other "standard" string function isn't really relevant as it's some other libraries duty.
the task of truncation and otherwise formatting UI strings is the preserve of the rendering layer(s).