1. The filename characters have to be valid D identifier characters. This annoys some people.
2. Because Windows has case-insensitive filenames, and Linux, etc., have case sensitive filenames, we recommend that path/filenames be in lower case for portability. This annoys some people.
3. There are command line switches to map from module names to filenames for special purposes. They're very rarely needed, but invaluable when they are.
> 2. Because Windows has case-insensitive filenames, and Linux, etc., have case sensitive filenames, we recommend that path/filenames be in lower case for portability. This annoys some people.
That's also a problem in most other languages; just a couple of days ago, someone else's C++ code didn't compile on my machine because they had accidentally included <something/whatever.h> when the file was actually named <something/Whatever.h>, because macOS is case insensitive. I had the same experience with JavaScript some months ago, that time because they were running Windows.
On the "filenames must be valid identifiers" thing; I really wish more languages would start allowing kebab-case in identifiers. That's also absolutely not a D thing, more of a common complaint about most languages.
Kebab-case is, in my opinion, the most beautiful of identifier cases, but how would you get around the ambiguity with subtraction, short of going full Lisp or adding whitespace sensitivity?
I don't think requiring whitespace around operators is such a bad thing. My personal coding style generally looks like `int foo = 10 + (something - 20);` anyways, and I think that style is a lot more readable than `int foo=10+(something-20);`. If you require whitespace around almost all operators, you open up the possibility of naming identifiers basically anything, which lets you have conventions like naming predicates or boolean members with a question mark at the end. In my opinion, `myObject.whatever?` looks a lot better than `myObject.isWhatever`.
Exactly which operators should require whitespace and which don't is up for debate, but in my personal opinion, requiring space around infix operators and letting prefix/postfix operators not require a space would be appropriate. Nobody wants to have to write `myArray [i]`, but I think most people would be willing to give up `i-1` and instead write `i - 1`.
Would adding whitespace sensitivity really be a problem? You already need whitespace to separate identifiers, so it's not a totally foreign concept in mainstream languages.
It seems like we've been making a weird trdeoff, by disallowing kebab-case just so we can smash our operators together with our operands.
>> It seems like we've been making a weird trdeoff, by disallowing kebab-case just so we can smash our operators together with our operands.
Some people don't want to bother putting a space between operators and operands, and proponents of kebab-case just don't want to push the shift key to get an underscore.
To get kebab-case, the restriction that identifiers cannot start or end with '-' gets you pretty far. The only whitespace change is that you sometimes need whitespace around an infix '-'. Other operators are still fine, and it still works fine for prefix and postfix operators.
Also, the reason to prefer kabab-case for me has nothing to do with avoiding a keypress. It's that I find kebab-case easier to read.
As the other poster mentioned, you gain more than just the dash when forcing white space; you get forced readability and characters like / ? and ^ in identifiers. Then you can name things like foo/bar or e^x
You wouldn't have to require whitespace around all operators; you could, for example, decide that infix math/bitwise/logic operators must have a space around them, while other operators (like the infix operators `.` and `->`, and the prefix/suffix `++`, `--`, `[...]`, and `!`) wouldn't require whitespace.
I agree that nobody would want to write `i ++` or `foo [10]` or `myvar . mymember`, but I think a lot of people could get behind `10 - 20` and `foo && bar` instead of `10-20` and `foo&&bar`.
Does one actually need whitespace to separate identifiers? This is something that's always bugged me a bit.
Aside from C-style type declarations ("unsigned int x;"), C-style syntaxes seem to always have ways other than whitespace to separate identifiers.
Like (using some JavaScript in a hypothetical example) I can't think of many concrete reasons why this is easier to parse:
let first_number=2, second_number=2, answer=first_number-second_number;
...than this:
let first number=2, second number=2, answer=first number-second number;
Although, of course, some languages—most Lisps, Tcl, and Red/REBOL come to mind—actually do rely on whitespace and whitespace alone to separate identifiers in many situations, and something like this would likely be unworkable there.
Oh, I never noticed that. That's pretty interesting—although unfortunately, that syntax does not look terribly convenient to write, which is the main thing I'm after here.
I think the best way to get identifiers with whitespace to work in a Lisp would be contrive a syntax for S-expressions that uses something other than whitespace to separate things. Perhaps letting (first rest-1 rest-2 ...) be written as as (first: rest 1, rest 2, ...) or (first, rest 1, rest 2, ...), so that example could be written as:
(let: ((first number: 10),
(second number: 20)),
(+: first number, second number))
I imagine it would be possible to write a macro in Common Lisp to transform this into runnable code, or a language in Racket to do so—although, I'm not sure how many people would actually want to make or use something like this.
TikZ allows whitespace in identifiers. At first it was pretty strange, but I actually really like it now. I don't think parsing it is much of a problem, and I would quite like to be able to use it in other languages.
Well, thinking about it more, I did realize there's a pretty nasty edge case with my hypothetical JavaScript syntax:
let let x = 5;
let x = 6;
// should this set the variable "let x"?
// or define a variable named "x"?
One could potentially design around situations like this, but allowing whitespace in identifiers likely does require being much more meticulous about treatment of reserved words, identifiers, and whitespace than more traditional syntaxes, and this is likely why not many people attempt this.
I think the idea is worth experimenting with, though, and that a good implementation of it could be convenient enough for end-users to outweigh the implementation inconvenience.
Well, if you're only interested in kebab-case, then specifying that identifiers can't start or end with hyphens solves most of the problem. The only restriction would be that you'd need whitespace around '-' only when used as an infix operator.
Sure, just make everyone replace their keyboard with one which has two `-` buttons and make everyone understand why they need two buttons for something which looks like the same letter and you can safely use – for one thing and — for the other.
As a non-native English speaker, I understand the desire to name things in my native language (German), but for all but languae francae, naming things in a native language presents an obstacle to sharing these things with others.
Compare итератор and 迭代器, which are complete mysteries to me produced by Google Translate. If my intention were to reach as many people as possible, I'd use "iterator" (which, coincidentally, works for English and my native German).
There are technical terms and there is domain terminology.
If once worked in finance and there is a difference between GAAP accounting and German accounting rules. If my algorithms used English terminology to be consistent with technical terms this would be confusing inneach review. Using German terms (even combined with English "get" or "set", like "getBetriebsertrag") there was beneficial, even though it always confussed new members of the team.
Speaking as someone who doesn't have English as their first language I think programming should be in English and ASCII. This includes identifiers and filenames. Strings on the other hand should be 100% valid Unicode, never ASCII.
This means the code can be read by anyone anywhere in the world on any operating system and that string payloads can similarly be read by anyone anywhere in the world.
Since keywords and APIs are usually in English, continuing to follow that convention in your variable names is often the most natural option.
But in cases where the program will be dealing with some concept that doesn't exist in English, being able to refer to things by their actual name, in the native language (assuming that's also the native language of the customer and development team), is much better than inventing a confusing and unnatural English translation.
> This means the code can be read by anyone anywhere in the world on any operating system and that string payloads can similarly be read by anyone anywhere in the world.
Uh, no? You are not supposed to be able to read this valid Unicode string literal in Korean: `"그뤼고 이 문좌열은 일부려 기ㅖ버역을 어럽게하러고 오타비문이 산개해 있구먼요."`
Also a significant portion (and possibly the majority) of codes would be ever read and written by a small group of people, often sharing a common language other than English, so non-English code is just fine for them. If you are saying that a public library should be written in English, I almost agree---there would be some exceptions though.
public class 그뤼고
{
private 이 이 {get;set;}
public 그뤼고()
{
var 문좌열은 = 이.문좌열은;
var 기ㅖ버역을 = Enumerable.Range(0, 10);
var 오타비문이 = 기ㅖ버역을.Select(요 => 요);
}
}
public class 이
{
public int 문좌열은 {get;set;}
}
Sometimes, though my example was intentionally obscured to prevent machine translation and you were not aware of Korean conjugations (usually omitted in identifiers) ;-)
I have seen numerous instances of pseudo-English when it comes to naming. It is hard to name things in non-native tongues. When reasonable, reducing that overhead can be indeed beneficial.
A programming language's identifiers is not the place to express one's national identity. They should be utilitarian, and easily understood by programmers across countries.
Since you're already supposed to understand the syntax of every major programming language (which is based on english) you can make do with english keywords too. Nothing worse than opening some code to find bizarro foreign language identifiers.
(And I'm no native english speaker, so I'm not speaking as someone who's ok with this because english is their language or ASCII fits their default keyboard layout: it just makes sense).
So we should live in a world where Chinese programmers share knowledge only with Chinese programmes, Americans share among them and with Brits, Aussies and Canadians, and Mexicans share with Spaniards?
I'm not a native English speaker, and I have only to loose in this scenario.
Look, I worked for years in localization and I'm actually a proponent of English as a worldwide-spoken language. But I'm pointing out the irony in saying "Nothing worse than opening some code to find bizarro foreign language identifiers." when that is exactly how CJKHT(...) programmers, many of who do not speak a word of any western language, feel. Closer to the west, even people in countries that use cyrillic or greek alphabets are not necessarily familiar with latin transliteration and forcing it on them is dubious.
I mean, yeah, this can be made a requirement of programming languages: After all, it was such a requirement for a long, long time. But it doesn't have to be anymore.
BTW, full disclosure, I'm French and I find code written in french completely fucking unreadable. And that's 100% ASCII. As I said I believe code should be written in English, but I also don't think we should have essentially-artificial barriers for people to enter something as important as programming; those barriers only end up eroding the culture in question.
No one is claiming Chinese programmers should only code in Chinese, but to assert that they or anyone must code in American English for the convenience of Western programmers seems absurd. Should they also be forced to write all of their novels and perform all of their movies and music in English as well?
We live in a multicultural world, one for which English as a default doesn't make sense in every context. Yes, it may be the case that Chinese, Spanish, Greek, German, Arabic, or other non-American ideogram using programmers write code primarily meant to be used and understood within their own culture. I see nothing wrong with that.
Chinese programmers using ASCII and English names for things would also be of great help to all other East Asian programmers that aren't Chinese.
Also what about the fact that while Mandarin is the largest language/dialect it's far from the only one in China? Using English/ASCII means all the programmers in China can understand each others code...
For that matter, there are human languages besides English that can be written using only ASCII (sometimes by transliterating) and that's not what happens in them.
Just like people know they have to write in English on HN to communicate, they also write their code in English when they plan to open-source it and share with the rest of the world. As for closed-source projects ... if your company doesn't conduct its business in English, why force the code to be in English? The only people whom that'd benefit are never going to see it.
This is something that only humans care about, not computers. And humans can be accommodated by prettyprinting - or, in a pinch, by roundtrip conversion from an ASCII-only format to a "rich", Unicode-based one, and back. But let computers have their simple, ASCII-based identifier names. E.g. https://en.wikipedia.org/wiki/Punycode is a thing, and is routinely used for "native-language" domain names. But guess what, these domain names are still ASCII under the hood!
(Indeed, we should arguably move away from the notion of a single character string as the only human-facing semantics that an identifier is associated with-- there should be a higher layer, perhaps with multiple choices of e.g. native language, formatting and the like. Human facing semantics are closer to "literate" documentation than to anything that compilers should have to deal with. Yes, the "native", underlying representation should still be something that we can somehow make sense of - I'm not saying that our identifiers should be GUIDs or anything like that! But it will only be resorted to in a pinch.)
That's a false dichotomy. If you were talking about restricting identifiers to codepoints in the Unicode "letter" categories, you might have a point. (Nb. there are approximately 160,000 "letter" codepoints in Unicode, the vast majority of which being CJK ideograms.)
No, I'm saying restricting identifiers to AZaz09_.
I don't see a dichotomy (much less a false one). What are the two options I separate artificially?
I'm saying just don't impose regional alphabets (other than AZaz that's already par for the course with the syntax of all major programming languages anyway) and regional words into source code.
You disregarded 99.99% of Unicode as "math symbols and poop emoji". That's just a ridiculously biased Anglocentric viewpoint. It's 2019; there's zero reason to force people to stick to either a subset of their native script, or a completely foreign one, when naming identifiers in a programming language.
I haven't used D, but in Haskell we have the same module name == file name thing. The only time I don't like it is when we have nested modules, the parent and children modules are not in the same directory:
Thus, two semantically related modules are now in different directories.
Python, IMO, handles this correctly by having __init__.py support inside directories. It's theoretically less elegant because of the special name, but in practice leads to better file organization.
Same for Rust, but even better, because one can define nested modules in the same file. So you can either define a new module in the same file, put it in a different file named by the module, or put it in the file `mod.rs` inside the directory named by the module.
Sure, but that's not specific to package.d. See the earlier convention of all.d or d.d. That's the trade-off with having public imports. I don't see how this is related to my comment though?