Have there been any downsides or annoyances with that approach?

WalterBright · on Jan 28, 2019

1. The filename characters have to be valid D identifier characters. This annoys some people.

2. Because Windows has case-insensitive filenames, and Linux, etc., have case sensitive filenames, we recommend that path/filenames be in lower case for portability. This annoys some people.

3. There are command line switches to map from module names to filenames for special purposes. They're very rarely needed, but invaluable when they are.

Overall, it's been very successful.

mort96 · on Jan 28, 2019

> 2. Because Windows has case-insensitive filenames, and Linux, etc., have case sensitive filenames, we recommend that path/filenames be in lower case for portability. This annoys some people.

That's also a problem in most other languages; just a couple of days ago, someone else's C++ code didn't compile on my machine because they had accidentally included <something/whatever.h> when the file was actually named <something/Whatever.h>, because macOS is case insensitive. I had the same experience with JavaScript some months ago, that time because they were running Windows.

On the "filenames must be valid identifiers" thing; I really wish more languages would start allowing kebab-case in identifiers. That's also absolutely not a D thing, more of a common complaint about most languages.

giornogiovanna · on Jan 28, 2019

Kebab-case is, in my opinion, the most beautiful of identifier cases, but how would you get around the ambiguity with subtraction, short of going full Lisp or adding whitespace sensitivity?

mort96 · on Jan 28, 2019

I don't think requiring whitespace around operators is such a bad thing. My personal coding style generally looks like `int foo = 10 + (something - 20);` anyways, and I think that style is a lot more readable than `int foo=10+(something-20);`. If you require whitespace around almost all operators, you open up the possibility of naming identifiers basically anything, which lets you have conventions like naming predicates or boolean members with a question mark at the end. In my opinion, `myObject.whatever?` looks a lot better than `myObject.isWhatever`.

Exactly which operators should require whitespace and which don't is up for debate, but in my personal opinion, requiring space around infix operators and letting prefix/postfix operators not require a space would be appropriate. Nobody wants to have to write `myArray [i]`, but I think most people would be willing to give up `i-1` and instead write `i - 1`.

sacado2 · on Jan 29, 2019

I like to be able to remove spaces in complex expressions, for readability issues. For instance, this :

    t[x] = t[x-1] + t[x-2]

looks more readable to me than this :

    t[x] = t[x - 1] + t[x - 2]

Another example :

    y = a*x1 + b*x2 + c

wtetzner · on Jan 28, 2019

> adding whitespace sensitivity?

Would adding whitespace sensitivity really be a problem? You already need whitespace to separate identifiers, so it's not a totally foreign concept in mainstream languages.

It seems like we've been making a weird trdeoff, by disallowing kebab-case just so we can smash our operators together with our operands.

phkahler · on Jan 28, 2019

>> It seems like we've been making a weird trdeoff, by disallowing kebab-case just so we can smash our operators together with our operands.

Some people don't want to bother putting a space between operators and operands, and proponents of kebab-case just don't want to push the shift key to get an underscore.

wtetzner · on Jan 28, 2019

To get kebab-case, the restriction that identifiers cannot start or end with '-' gets you pretty far. The only whitespace change is that you sometimes need whitespace around an infix '-'. Other operators are still fine, and it still works fine for prefix and postfix operators.

Also, the reason to prefer kabab-case for me has nothing to do with avoiding a keypress. It's that I find kebab-case easier to read.

ModernMech · on Jan 28, 2019

As the other poster mentioned, you gain more than just the dash when forcing white space; you get forced readability and characters like / ? and ^ in identifiers. Then you can name things like foo/bar or e^x

mikepurvis · on Jan 28, 2019

I suppose I could get used to that, but it sounds like a nightmare at first blush.

plopz · on Jan 28, 2019

I don't think I could handle i ++ instead of i++

mort96 · on Jan 28, 2019

You wouldn't have to require whitespace around all operators; you could, for example, decide that infix math/bitwise/logic operators must have a space around them, while other operators (like the infix operators `.` and `->`, and the prefix/suffix `++`, `--`, `[...]`, and `!`) wouldn't require whitespace.

I agree that nobody would want to write `i ++` or `foo [10]` or `myvar . mymember`, but I think a lot of people could get behind `10 - 20` and `foo && bar` instead of `10-20` and `foo&&bar`.

m48 · on Jan 28, 2019

Does one actually need whitespace to separate identifiers? This is something that's always bugged me a bit.

Aside from C-style type declarations ("unsigned int x;"), C-style syntaxes seem to always have ways other than whitespace to separate identifiers.

Like (using some JavaScript in a hypothetical example) I can't think of many concrete reasons why this is easier to parse:

  let first_number=2, second_number=2, answer=first_number-second_number;

...than this:

  let first number=2, second number=2, answer=first number-second number;

Although, of course, some languages—most Lisps, Tcl, and Red/REBOL come to mind—actually do rely on whitespace and whitespace alone to separate identifiers in many situations, and something like this would likely be unworkable there.

_19qg · on Jan 28, 2019

Though one can use whitespace in Common Lisp identifiers, by quoting symbols:

  CL-USER 115 > (let ((first| |number 10)
                      (second\ number 20))
                  (+ first\ number |SECOND NUMBER|))
  30

m48 · on Jan 28, 2019

Oh, I never noticed that. That's pretty interesting—although unfortunately, that syntax does not look terribly convenient to write, which is the main thing I'm after here.

I think the best way to get identifiers with whitespace to work in a Lisp would be contrive a syntax for S-expressions that uses something other than whitespace to separate things. Perhaps letting (first rest-1 rest-2 ...) be written as as (first: rest 1, rest 2, ...) or (first, rest 1, rest 2, ...), so that example could be written as:

  (let: ((first number: 10), 
         (second number: 20)), 
        (+: first number, second number))

I imagine it would be possible to write a macro in Common Lisp to transform this into runnable code, or a language in Racket to do so—although, I'm not sure how many people would actually want to make or use something like this.

kazinator · on Jan 29, 2019

Not all whitespace you see in Lisp code is strictly necessary:

TXR Lisp:

  1> (list 1"a"'(b(c)d(e)))
  (1 "a" (b (c) d (e)))

Here we just have one space that prevents list 1 from being list1.

mkl · on Jan 28, 2019

TikZ allows whitespace in identifiers. At first it was pretty strange, but I actually really like it now. I don't think parsing it is much of a problem, and I would quite like to be able to use it in other languages.

m48 · on Jan 28, 2019

Well, thinking about it more, I did realize there's a pretty nasty edge case with my hypothetical JavaScript syntax:

  let let x = 5;
  let x = 6; 
  // should this set the variable "let x"?
  // or define a variable named "x"?

One could potentially design around situations like this, but allowing whitespace in identifiers likely does require being much more meticulous about treatment of reserved words, identifiers, and whitespace than more traditional syntaxes, and this is likely why not many people attempt this.

I think the idea is worth experimenting with, though, and that a good implementation of it could be convenient enough for end-users to outweigh the implementation inconvenience.

blt · on Jan 28, 2019

I generally separate operands, but in dense math expressions it's not always the best for readability.

   x = (-b + sqrt(b**2 + 4*a*c) / (2*a)
   
   x = (- b + sqrt(b ** 2 + 4 * a * c) / (2 * a)

... Although, one could argue that allowing tightened multiplication and division are enough.

AnimalMuppet · on Jan 28, 2019

Not to be that guy, but...

  x = (-b + sqrt(b**2 - 4*a*c) / (2*a)

blt · on Feb 1, 2019

lol, thank you.

wtetzner · on Jan 28, 2019

Well, if you're only interested in kebab-case, then specifying that identifiers can't start or end with hyphens solves most of the problem. The only restriction would be that you'd need whitespace around '-' only when used as an infix operator.

tzs · on Jan 28, 2019

Use +- for subtraction, where + is binary plus and - is unary minus.

a1369209993 · on Jan 28, 2019

Not sure if it's a good idea, but maybe try:

  foo-bar # kebab-case
  foo−bar # subtraction
  foo minus bar # subtraction? (infix identifier "minus")
  foo - bar # subtraction (infix identifier "-")
  foo − bar # subtraction (operator symbol)

using \u2212 as a explicit subtraction operator for people who really can't stand having 'extra' whitespace?

slobotron · on Jan 28, 2019

Perl6 allows it, and it's the preferred convention to boot.

Small Intro: https://perl6advent.wordpress.com/2015/12/05/day-5-identifie...

kps · on Jan 28, 2019

Use ‘−’ for subtraction, allow ‘–’ and ‘‐’ in identifiers, and report an error for ‘-’.

mort96 · on Jan 28, 2019

Sure, just make everyone replace their keyboard with one which has two `-` buttons and make everyone understand why they need two buttons for something which looks like the same letter and you can safely use – for one thing and — for the other.

xyproto · on Jan 28, 2019

Those people are easily annoyed.

afiori · on Jan 28, 2019

> 1. The filename characters have to be valid D identifier characters. This annoys some people.

this could be solved by allowing strings in qualified imports

  import "illegal identifier"."some more weird unicode" as someLib;

AnIdiotOnTheNet · on Jan 28, 2019

Which, incidentally, is how Zig does it:

  const thing = @import("relative/path/to/thing.zig");
  const package = @import("packagename");

gpm · on Jan 28, 2019

And rust

`extern crate foo`

`extern crate "foo-bar" as foo_bar`

This is all legal identifiers but

`use std::path::Path;`

`use std::path::Path as int; // For maximum confusion`

And not that anyone uses this part but:

`mod bazz;`

`#[path = "bazz-bar.rs"] mod bazz_bar;`

coldtea · on Jan 28, 2019

>The filename characters have to be valid D identifier characters. This annoys some people.

I'd say "valid X language identifier characters" should always be ASCII.

I never understood the BS fad for unicode identifiers.

Wanna allow some math symbols? Maybe. The full unicode gamut, so that you can have a variable named shit emoji? Yeah, no.

enedil · on Jan 28, 2019

But that's not only about Unicode. You can't start your filename with digit or a dash.

ZiiS · on Jan 28, 2019

This stops most of the world naming things in their native language.

ckastner · on Jan 28, 2019

As a non-native English speaker, I understand the desire to name things in my native language (German), but for all but languae francae, naming things in a native language presents an obstacle to sharing these things with others.

Compare итератор and 迭代器, which are complete mysteries to me produced by Google Translate. If my intention were to reach as many people as possible, I'd use "iterator" (which, coincidentally, works for English and my native German).

johannes1234321 · on Jan 28, 2019

There are technical terms and there is domain terminology.

If once worked in finance and there is a difference between GAAP accounting and German accounting rules. If my algorithms used English terminology to be consistent with technical terms this would be confusing inneach review. Using German terms (even combined with English "get" or "set", like "getBetriebsertrag") there was beneficial, even though it always confussed new members of the team.

beaconstudios · on Jan 28, 2019

funnily enough, the Russian also spells "iterator", just in Cyrillic. But your point stands nonetheless.

kungtotte · on Jan 28, 2019

Speaking as someone who doesn't have English as their first language I think programming should be in English and ASCII. This includes identifiers and filenames. Strings on the other hand should be 100% valid Unicode, never ASCII.

This means the code can be read by anyone anywhere in the world on any operating system and that string payloads can similarly be read by anyone anywhere in the world.

ptx · on Jan 28, 2019

Since keywords and APIs are usually in English, continuing to follow that convention in your variable names is often the most natural option.

But in cases where the program will be dealing with some concept that doesn't exist in English, being able to refer to things by their actual name, in the native language (assuming that's also the native language of the customer and development team), is much better than inventing a confusing and unnatural English translation.

lifthrasiir · on Jan 28, 2019

> This means the code can be read by anyone anywhere in the world on any operating system and that string payloads can similarly be read by anyone anywhere in the world.

Uh, no? You are not supposed to be able to read this valid Unicode string literal in Korean: `"그뤼고 이 문좌열은 일부려 기ㅖ버역을 어럽게하러고 오타비문이 산개해 있구먼요."`

Also a significant portion (and possibly the majority) of codes would be ever read and written by a small group of people, often sharing a common language other than English, so non-English code is just fine for them. If you are saying that a public library should be written in English, I almost agree---there would be some exceptions though.

yarosv · on Jan 28, 2019

Do you mean you want like C#?

    public class 그뤼고
    {
      private 이 이 {get;set;}

      public 그뤼고()
      {
        var 문좌열은 = 이.문좌열은;
        var 기ㅖ버역을 = Enumerable.Range(0, 10);
        var 오타비문이 = 기ㅖ버역을.Select(요 => 요);
      }
    }

    public class 이
    {
      public int 문좌열은 {get;set;}
    }

lifthrasiir · on Jan 28, 2019

Sometimes, though my example was intentionally obscured to prevent machine translation and you were not aware of Korean conjugations (usually omitted in identifiers) ;-)

I have seen numerous instances of pseudo-English when it comes to naming. It is hard to name things in non-native tongues. When reasonable, reducing that overhead can be indeed beneficial.

yarosv · on Feb 6, 2019

Oops (yes, I don't know what it meant). I mean I speak Ukrainian and Russian. If I saw the code like that in my native tongue, I would be upset.

Plus it is pain to alt+shift between languages all the time.

coldtea · on Jan 28, 2019

That's the main benefit.

A programming language's identifiers is not the place to express one's national identity. They should be utilitarian, and easily understood by programmers across countries.

Since you're already supposed to understand the syntax of every major programming language (which is based on english) you can make do with english keywords too. Nothing worse than opening some code to find bizarro foreign language identifiers.

(And I'm no native english speaker, so I'm not speaking as someone who's ok with this because english is their language or ASCII fits their default keyboard layout: it just makes sense).

scrollaway · on Jan 28, 2019

I see what you're saying, but then again, this is exactly how many chinese programmers feel all the time.

> Nothing worse than opening some code to find bizarro foreign language identifiers.

lultimouomo · on Jan 28, 2019

So we should live in a world where Chinese programmers share knowledge only with Chinese programmes, Americans share among them and with Brits, Aussies and Canadians, and Mexicans share with Spaniards?

I'm not a native English speaker, and I have only to loose in this scenario.

scrollaway · on Jan 28, 2019

Look, I worked for years in localization and I'm actually a proponent of English as a worldwide-spoken language. But I'm pointing out the irony in saying "Nothing worse than opening some code to find bizarro foreign language identifiers." when that is exactly how CJKHT(...) programmers, many of who do not speak a word of any western language, feel. Closer to the west, even people in countries that use cyrillic or greek alphabets are not necessarily familiar with latin transliteration and forcing it on them is dubious.

I mean, yeah, this can be made a requirement of programming languages: After all, it was such a requirement for a long, long time. But it doesn't have to be anymore.

BTW, full disclosure, I'm French and I find code written in french completely fucking unreadable. And that's 100% ASCII. As I said I believe code should be written in English, but I also don't think we should have essentially-artificial barriers for people to enter something as important as programming; those barriers only end up eroding the culture in question.

krapp · on Jan 28, 2019

No one is claiming Chinese programmers should only code in Chinese, but to assert that they or anyone must code in American English for the convenience of Western programmers seems absurd. Should they also be forced to write all of their novels and perform all of their movies and music in English as well?

We live in a multicultural world, one for which English as a default doesn't make sense in every context. Yes, it may be the case that Chinese, Spanish, Greek, German, Arabic, or other non-American ideogram using programmers write code primarily meant to be used and understood within their own culture. I see nothing wrong with that.

kungtotte · on Jan 28, 2019

Chinese programmers using ASCII and English names for things would also be of great help to all other East Asian programmers that aren't Chinese.

Also what about the fact that while Mandarin is the largest language/dialect it's far from the only one in China? Using English/ASCII means all the programmers in China can understand each others code...

detaro · on Jan 28, 2019

There are languages that allow more than ASCII, and that's not what happens in them.

yorwba · on Jan 28, 2019

For that matter, there are human languages besides English that can be written using only ASCII (sometimes by transliterating) and that's not what happens in them.

Just like people know they have to write in English on HN to communicate, they also write their code in English when they plan to open-source it and share with the rest of the world. As for closed-source projects ... if your company doesn't conduct its business in English, why force the code to be in English? The only people whom that'd benefit are never going to see it.

zozbot123 · on Jan 28, 2019

This is something that only humans care about, not computers. And humans can be accommodated by prettyprinting - or, in a pinch, by roundtrip conversion from an ASCII-only format to a "rich", Unicode-based one, and back. But let computers have their simple, ASCII-based identifier names. E.g. https://en.wikipedia.org/wiki/Punycode is a thing, and is routinely used for "native-language" domain names. But guess what, these domain names are still ASCII under the hood!

(Indeed, we should arguably move away from the notion of a single character string as the only human-facing semantics that an identifier is associated with-- there should be a higher layer, perhaps with multiple choices of e.g. native language, formatting and the like. Human facing semantics are closer to "literate" documentation than to anything that compilers should have to deal with. Yes, the "native", underlying representation should still be something that we can somehow make sense of - I'm not saying that our identifiers should be GUIDs or anything like that! But it will only be resorted to in a pinch.)

Sharlin · on Jan 28, 2019

That's a false dichotomy. If you were talking about restricting identifiers to codepoints in the Unicode "letter" categories, you might have a point. (Nb. there are approximately 160,000 "letter" codepoints in Unicode, the vast majority of which being CJK ideograms.)

WalterBright · on Jan 28, 2019

D allows the same Unicode codepoints as identifier characters as does C and C++.

coldtea · on Jan 28, 2019

No, I'm saying restricting identifiers to AZaz09_.

I don't see a dichotomy (much less a false one). What are the two options I separate artificially?

I'm saying just don't impose regional alphabets (other than AZaz that's already par for the course with the syntax of all major programming languages anyway) and regional words into source code.

Sharlin · on Jan 28, 2019

You disregarded 99.99% of Unicode as "math symbols and poop emoji". That's just a ridiculously biased Anglocentric viewpoint. It's 2019; there's zero reason to force people to stick to either a subset of their native script, or a completely foreign one, when naming identifiers in a programming language.

ketzu · on Jan 28, 2019

Valid characters are usually not the full ASCII set at all, so why not pick and chose from unicode as well?

Or why not full unicode text support? Is there any real reason besides "some people might want emoji and I don't like that"?

SiempreViernes · on Jan 28, 2019

They might use their native language too!

But no, the totality of the argument always reduces to: "I'm not used to this and would find it inconvenient"

kccqzy · on Jan 28, 2019

I haven't used D, but in Haskell we have the same module name == file name thing. The only time I don't like it is when we have nested modules, the parent and children modules are not in the same directory:

    import A      -- compiler reads A.hs
    import A.B    -- compiler reads A/B.hs

Thus, two semantically related modules are now in different directories.

Python, IMO, handles this correctly by having __init__.py support inside directories. It's theoretically less elegant because of the special name, but in practice leads to better file organization.

Same for Rust, but even better, because one can define nested modules in the same file. So you can either define a new module in the same file, put it in a different file named by the module, or put it in the file `mod.rs` inside the directory named by the module.

ben-schaaf · on Jan 28, 2019

D also has package.d for the same effect as python. It works quite well as package is a keyword and thus can't be used as a regular module anyway.

p0nce · on Jan 28, 2019

But package.d could lead you to overdepend on too many modules too :) it's mostly great for end applications rather than internal to libraries.

ben-schaaf · on Jan 28, 2019

Sure, but that's not specific to package.d. See the earlier convention of all.d or d.d. That's the trade-off with having public imports. I don't see how this is related to my comment though?

JoshTriplett · on Jan 28, 2019

> The only time I don't like it is when we have nested modules, the parent and children modules are not in the same directory:

> import A -- compiler reads A.hs

> import A.B -- compiler reads A/B.hs

That has to happen at some level, assuming subdirectories; otherwise, what would "import A.B.C" refer to?