Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> That's how it should be

Why? Windows is also not case-sensitive, so it's not like there's a near-universal convention that S3 is ignoring.

Case sensitivity in file names is surprising even to non-technical people. If someone says they sent you "Book Draft 1.docx" and you check your email to find "Book draft 1.docx," you don't say, "Hey! I think you sent me the wrong file!"

Casing is usually not meaningful even in written language. "Hi, how are you?" means the same thing as "hi, how are you?" Uppercase changes meaning only when distinguishing between proper and common nouns, which is rarely a concern we have with file names anyway.



> If someone says they sent you "Book Draft 1.docx" and you check your email to find "Book draft 1.docx," you don't say, "Hey! I think you sent me the wrong file!"

But you also wouldn't say that if they sent "Book - Draft 1.docx", "Book Draft I.docx", "BookDraft1.docx", "Book_Draft_1.docx", or "Book Draft 1.doc", and surely you wouldn't want a filesystem to treat all of them as the same.


This is a personal reason, but the reason I prefer case sensitive directory names is I can make "logical groupings" for things. So, my python git directory might have "Projects/" and "Packages/," and the capitalization not only makes them stand out as a sort of "root path" for whatever's underneath, but the capitalization makes me conscious of the commands I'm typing with that path. I can't just autopilot a path name, I have to consciously hit shift when tab completion stops working.

That might sound like a dumb reason, but it's kept me from moving things into the wrong directory, or accidentally removing a directory multiple times in the past.

I also use Windows regularly and it really isn't a hindrance, so maybe I wouldn't actually be bothered if everything was case sensitive.


TBF, you don't need case sensitive FS for that, just case retaining is enough. And then have the option on how to sort it.


Don't you need case sensitivity for this part?

> I can't just autopilot a path name, I have to consciously hit shift when tab completion stops working.

On a system that's case retaining but not case sensitive, wouldn't "pr" autocomplete to "Projects"?


No, MacOS doesn’t do that. `cat Foo` and `cat foo` will both work, but only the first one will tab complete if the file is called `Foo`.


zsh tab-completed both just fine, preserving the case in both. I’d have preferred it corrected the case, but meh.


I like it! That's a great idea.

To me, this sounds like a great practice for terminal environments but may be less intuitive when using file system apps. I could easily overlook a single letter capitalization in a GUI view of many directories. Maybe it's because at a terminal the "view" into the file system is narrow?

Now I'm wondering how I can use this in my docker images. I mean that might irritate devops. Well, maybe they'll like it too. Man, thanks for posting this.


You have to draw the line somewhere, but I do appreciate when the UI sorts "Book draft 2" before "Book draft 11". That requires nontrivial tokenization logic and inference, but simple heuristics can be right often enough to be useful.

On that note, ASCIIbetical sort is never the right answer. There is a special place in hell for any human-facing UI that sorts "Zook draft 1" between "Book draft 1" and "book draft 1".


And that line, at least for sorting, belongs firmly outside the filesystem.

Sorting is locale-dependent. Whether a letter-with-dots sorts next to letter-without-dots or somewhere completely different has no correct global answer.


I think there's a pretty big difference between how the UI orders things and how the filesystem treats things as equivalent. A filesystem treating names case sensitively doesn't prevent the UI from tokenizing the names in any other arbitrary way


Capitalization isn't part of grammar. Those examples are different strings of characters altogether.


The classic, if crude, counterexample: "I helped my uncle Jack off a horse."

(The uncapitalized version doesn't just have different semantics; it has a completely different parse-tree!)


I'll augment your statement by noting that punctuation is also not part of grammar.


Another classic counterexample: "This book is dedicated to my parents, Ayn Rand, and God." "This book is dedicated to my parents, Ayn Rand and God."


you called it - those are different situations all right


There are just not the same characters. A filesystem should not have an opinion on what strings of characters _mean_ the same. It is the wrong level of abstraction.

filenames might even not be words at all, and surely not limited to English. We shouldn't implement rules and conventions from spoken English at a filesystem level, certainly not S3.

MacOS and Windows are just wrong about this.


Windows doesn’t have it at the file system layer, NTFS is case sensitive. Windows has it at the Win32 subsystem layer, see replies and comments here:

https://superuser.com/questions/364057/


That's way worse than just putting it on the file system.

Now you have hidden information, that you can't ever change, and may or may not impact whatever you are doing.


What hidden information that you can't ever change?


I think what they mean is if you somehow had two files with the same name but different cases (as NTFS supports this) it would be impossible to fix with win32 calls


> Windows doesn’t have it at the file system layer, NTFS is case sensitive.

I think the common phrasing is "case-aware, not case-sensitive".


No, NTFS has always been at least optionally case sensitive; current Windows versions even allow case-sensitivity to be controlled on a per-directory basis[1], which even works for (some) Win32 programs:

  Microsoft Windows [Version 10.0.22631.3593]
  (c) Microsoft Corporation. All rights reserved.
  
  C:\Users\jtm>mkdir foo
  
  C:\Users\jtm>fsutil file setCaseSensitiveInfo foo
  Case sensitive attribute on directory C:\Users\jtm\foo is enabled.
  
  C:\Users\jtm>echo bar > foo\bar.txt
  
  C:\Users\jtm>echo Bar > foo\Bar.txt
  
  C:\Users\jtm>dir foo
   Volume in drive C is Aristotle-Win
   Volume Serial Number is E4AE-428B
  
   Directory of C:\Users\jtm\foo
  
  2024-05-31  17:55    <DIR>          .
  2024-05-31  17:55    <DIR>          ..
  2024-05-31  17:55                 6 Bar.txt
  2024-05-31  17:55                 6 bar.txt
                 2 File(s)             12 bytes
                 2 Dir(s)  41,524,133,888 bytes free
  
  C:\Users\jtm>type foo\bar.txt
  bar
  
  C:\Users\jtm>type foo\Bar.txt
  Bar
[1] https://learn.microsoft.com/en-us/windows/wsl/case-sensitivi...


And so should we be able to have “é.txt” and “é.txt” in the same directory (with a different UTF-8 normalization?) What encoding should we use BTW?

I’m not advocating for case-insensitive fs (literally the first thing I do when I get a Mac is reformat it to be on a case-sensitive fs), but things are not that simple either.


> And so should we be able to have “é.txt” and “é.txt” in the same directory

That's what Linux does.

It does create some problems that seem to never happen on practice, while it avoids some problems that seem to happen once in a while. So yeah, I'd say it's a good idea.


You look from technical perspective. From average person perspective, even files are too much technicality to deal with.

As a user I want my work to be preserved, I want to view my photos and I want system to know where is my funny foto of my dog I did last Christmas.

As a developer I need an identifier for a resource and I am not going to let user decide on the Id of the resource, I put files in system as GUID and keep whatever user feels as metadata.

Exposing average people to the filesystem is wrong level of abstraction. That is why iOS and Android apps are going that way - but as I myself am used to dealing with files it annoys me that I cannot have that level of control, but I accept that I am quite technical.


Dealing with files used to be something everyone interacting with computers had to do. It is something average people can do.

I think too much abstraction is a mistake and adds a lot of unneeded complexity.

People should learn something about technology they use. If you want to drive, you need understand how steering wheels work, if you want to drive a manual car (usual where I live and have lived) then you need to know how to work a gear stick and the effect of changing gear.


> used to be something everyone interacting with computers had to do

There were far fewer people 'interacting with computers' at that level years ago.


Everyone with an office job was still a lot of people though.


I'm not even sure 'everyone with an office job' had a computer. It certainly wasn't true 35 years ago. An office might have a computer or two, but not everyone had one, nor was everyone expected to use it.


Case insensitive matching is a surprisingly complicated, locale-dependent affair. Should I.txt and i.txt match? (Note that the first file is not named I.txt).

Case insensitive filesystems make about as much sense as ASCII-only filenames.


How would locale matter?


Off the top of my head, in turkish, `i` doesn't become `I`, it becomes `İ`. And `ı` is the lower case version of `I`


You don't need to decide how to upper or lower case a character to be insensitive to case, though. Treating them all as matching isn't a terrible option.


For example, it depends on the locale if the capitalized form of ß is ß or SS.


And yet case insensitive file name matching / string matching is one of my favourite windows features. It’s enormously convenient. An order of magnitude more convenient than the edge cases it causes me.

People aren’t ASCII or UTF-8 machines; “e” and “E” are the same character, that they are different ASCII codes is a behind the scenes implementation detail.

(That said, S3 isn’t a filesystem, it’s more like a web hashtable key-to-blob storage)


> People aren’t ASCII or UTF-8 machines; “e” and “E” are the same character

They are the same character to you, a native speaker of a Western language written in a latin script. They are the same to you because you are, in fact, an ASCII machine. Many many people in the world are not.


They are the same to me, they are different in ASCII, therefore I am not an ASCII machine. To me, the person using the computer to do work. Not the person wanting to do extra work to support the computer's internal leaky abstractions of data storage.

Your position, the position of too many people, is that I a native speaker of English etc. should not be allowed to have a computer working how English works because somewhere, someone else is different. This is like saying I shouldn't be allowed an English spell checker because there are other people who speak other languages.


> “e” and “E” are the same character

They don't look like the same character to me. A character is a written symbol. These are different symbols.

What definition of "character" are you using where they're the same character?

I haven't ruled out that I am wrong, this is a naive comment.


Are the words hello and HELLO spelled differently? I am pretty squarely in the camp that filesystems should be case sensitive (perhaps with an insensitive shell on top), but I would not consider those two words as having a different spelling. To me that means they are the same sequence of characters.


You are confusing characters with glyphs. A glyph is a written symbol.


And you seem to be conflating characters and letters. There are fewer letters in the standard alphabet than we have characters for the same, largely because we do distinguish between some letter forms.

I suppose you could imagine a world where we don't, in fact, do this with just the character code. Seems fairly different from where we are, though?


I thought that if they're different glyphs they're different characters.

Surely the fact that they're represented differently in ASCII means ASCII regards them as different characters?

Whether they're different glyphs or not depends on the font.


When you press the "E" key on a US keyboard and "e" comes out, do you return the keyboard because it's broken? If not, then you know what definition I'm using even if I misnamed it.


> It’s enormously convenient. An order of magnitude more convenient than the edge cases it causes me.

Can you elaborate on this?


Every single time I type a path or filename (or server name) in the shell, or in Windows explorer, or in a file -> open or save dialog, I don't trip over capitalization. If I want to glob files with an 'ecks' in the name I can write *x* and not have to do it twice for *x* and *X*.

When I look at a directory listing and it has "XF86Config", I read it in my head as "ecks eff eight six config" not "caps X caps F num eight num six initial cap Config" and I can type what I read and don't have to double-check if it's config or Config.

Tab completion works if I type x<tab> instead of blanking on me and making me double check and type X<tab>.

Case sensitivity is like walking down a corridor and someone hitting you to a stop every few steps and saying "you're walking Left Right Left Right but you should be walking Right Left Right Left".

Case insensitivity is like walking down a corridor.

In PowerShell, some cmdlets are named like Add-VpnConnection where the initialism drops to lowercase after the first letter, others like Get-VMCheckpoint where the initialism stays capitalised, others mixed like Add-NetIPHttpsCertBinding where IP is caps but HTTPS isn't - any capitalisation works for running them or searching them with get-command or tab-completing them. I don't have to care. I don't have to memorise it, type it, pay attention to it, trip over it, I don't have to care!.

"A programming language is low level when its programs require attention to the irrelevant." - Alan Perlis.

DNS names - ping GOOGLE.COM works, HTTPS://NEWS.YCOMBINATOR.COM works in a browser, MAC addresses are rendered with caps or lowercase hex on different devices, so are IPv6 addresses in hex format, email addresses - firstname.lastname or Firstname.Lastname is likely to work. File and directory access behaving the same means it's less bother. In Vim I :set ignorecase.

In PowerShell even string equality check is case insensitive by default, string match and split too. When I'm doing something like searching a log I want to see the english word 'error' if it's 'error' or 'ERROR' or 'Error' and I don't know what it is.

If I say the name of a document to a person I don't spell out the capitalisation. I don't want to have to do that to the computer, especially because there is almost no reason to have "Internal site 2 Network Diagram" and "INTERNAL site 2 network diagram" and "internal site 2 NETWORK DIAGRAM" in the same folder (and if there were, I couldn't easily keep them apart in my head).

All the time in command prompt shell, I press shift less often, type less, change directories and work with files more smoothly with less tripping over hurdles and being forced to stop and doublecheck what I'm tripping over when I read "word" and typed "word" and it didn't work.

On the other hand, the edge cases it causes me are ... well, I can't think of any because I don't want to put many files differing only by case in one directory. Maybe uncompressing an archive which has two files which clash? I can't remember that happening. Maybe moving a script to a case sensitive system? I don't do that often. In PowerShell, method calls are case insensitive. C# has "string".StartsWith() and JavaScript has .startsWith() and PowerShell will take .startswith() or .StartsWith or .Startswith or anything else. That occasionally clashes if there's a class with the same name in different case but that's rare, even.

In short, the computer pays attention to trivia so I don't have to. That's the right way round. It's about the best/simplest implementation of Do What I Mean (DWIM) that's almost always correct and almost never wrong.


If I want to glob files with an 'ecks' in the name I can write x* and not have to do it twice for x and X.*

Adding

  shopt -s nocaseglob
to ~/.bashrc makes globbing case-insensitive in bash[1].

Tab completion works if I type x<tab> instead of blanking on me and making me double check and type X<tab>.

Adding

  set completion-ignore-case on
to ~/.inputrc makes completion case-insensitive in bash (and other programs that use libreadline)[2].

Both options are independent of file system case-sensitivity.

[1] https://www.gnu.org/software/bash/manual/html_node/The-Shopt...

[2] https://tiswww.cwru.edu/php/chet/readline/readline.html#inde...


> Both options are independent of file system case-sensitivity.

In Windows world it works everywhere, in any win32 program - file open dialogs, et al. Here you have to have it built in to every tool. (and windows doesn't do it at the filesystem layer)


None of these are the filesystem though, they are all abstractions over the file system that could easily implement case insensitivity, and as a sibling comment pointed out, actually do in many cases. I'm perfectly fine with the idea of interacting with files using a case insensitive interface. I just don't feel like it should be the job of the filesystem to enforce case insensitivity.


Complicated for who? I've little pity for developers and kernels ease of life as a user.


> Casing is usually not meaningful even in written language. "Hi, how are you?"

How about: “pay bill” vs “pay Bill”?

“Usually” in the context of automated systems design is a recipe for disaster.

Computers store bytes, not characters that may just happen to mean similar things. Shall we merge ümlauts? How to handle ß?


Case Preserving and Case Sensitive are subtly two different things. Most case insensitive file systems are case preserving and whatever the UTF8 equivalent is I forget the name.


But the gps point is that assuming you know the semantic meaning of the case and if retention is enough is silly.

Assuming case insensitivity is bizarre.


heh, I especially enjoy that in a huge thread about how capitalization does and doesn't matter, "gps point" was not, in fact, concerning some coordinates of the global positioning system but rather "GP's point". I first chalked it up to some autocomplete artifact but then realized what was actually happening


Perfect is the enemy of good. It is quite acceptable to streamline the easy cases now and the hard cases later or never.


> Casing is usually not meaningful even in written language. "Hi, how are you?" means the same thing as "hi, how are you?" Uppercase changes meaning only when distinguishing between proper and common nouns, which is rarely a concern we have with file names anyway.

The number of spaces is usually not meaningful in written language. "Hi, how are you?" means the same thing as "Hi, how are you ?". I don't think it's a good reason to make file system ignore space characters.


No offense, but I think that's a very western-centric view. Your example only make sense when the user is familiar to English (or other western languages, I guess). To me personally, I find it strange that "D.txt" and "d.txt" means the same file, since they are two very different characters. Likewise, I think you would also go crazy if I tell you "ア.txt" and "あ.txt" means the same file (which is hiragana and katakana for A respectively, which in a sense is equivalent to uppercase and lowercase in Japanese), or "一.txt" and "壹.txt" means the same file (which both means number 1 in Chinese, we call the latter one literally "uppercase number")


Agreed, and you could even take this into "1.txt" being the same as "One.txt". Which, I mean, fair that I would expect a speech system to find either if I speak "One dot t x t". But, it would also find "Won.txt" and trying to bridge the phonetic to the symbolic is going to be obviously fraught with trouble.


> To me personally, I find it strange that "D.txt" and "d.txt" means the same file, since they are two very different characters.

As a native English speaker, I agree with this.


Those are all the same, I don’t see an issue


What if Unicode updates some capitalization rules in the next version, and after an OS updates some filenames now collide and one of the is inaccessible?


If someone says they sent you "Book Draft 1.docx" and you check your email to find "Ⓑⓞⓞⓚ Ⓓⓡⓐⓕⓣ ①.ⓓⓞⓒⓧ", "฿ØØ₭ ĐⱤ₳₣₮ 1.ĐØ₵Ӿ" - these are different files.


I have a feeling you enjoyed that character set lookup. I know I did seeing it.


Ages ago on Flowdock at work (a chat webapp kind of like Slack that no longer exists), I used the circle ones for a short time as my nickname, and no one could @ me.


File systems are not user interfaces. They are interfaces between programs and storage. Case insensitive is much better for programs.

The user shell can choose however it wants to handle file names, a case sensitive file system does not prevent the shell from handling file names case insensitively.


> case insensitive is much better for programs

Can’t edit my comment. I mean case sensitive is better for programs, of course.


> Why? Windows is also not case-sensitive, so it's not like there's a near-universal convention that S3 is ignoring.

Not sure why what Windows does is relevant to this, honestly. Personally, I strongly prefer case sensitivity with filenames, but the lack of it isn't a dealbreaker or anything.


What are some of the advantages of case sensitivity? Are you saying you actually want to save "Book draft 1.docx" and "Book Draft 1.docx" as two separate files? That just sounds like asking for trouble.


The advantages that I value are that case sensitivity means I can use shorter filenames, it makes it easier to generate programmatic filenames, and I can use case to help in organizing my files.

> Are you saying you actually want to save "Book draft 1.docx" and "Book Draft 1.docx" as two separate files?

That's a situation where sensitivity can cause difficulty, yes, but for me personally, that's a minor confusion that is easy to avoid or correct. Everything is a tradeoff, and for me, putting up with that annoyance is well worth the benefits of case sensitivity.

I do totally understand that others will have different tradeoffs that fit them better. I'm not taking away from that at all. But saying "case sensitivity is undesirable" in a broad sense is no more accurate than saying "case sensitivity is desirable" in a broad sense.

Personally, I think the ideal tradeoff is for the filesystem to be case sensitive, but have the user interfaces to that file system be able to make everything behave as case-insensitive if that's what the user prefers.


Even with only one case, just four characters is enough for a million files. How much benefit are you really getting from case sensitivity?


Unicode case folding is a complicated algorithm, and its definition is subject to change with updated Unicode versions. It's nice not to have to worry about that.


Okay, but I don't think this has anything to do with the use case JohnFen mentioned or my questions about it.

If your goal is super easy filename generation then you're probably not going to leave ASCII.

And if you do go beyond ASCII for filename packing/generating, then you should instead use many thousands of CJK characters that don't have any concept of case at all. Bypass the question of case sensitivity entirely.


Enough that I prefer it. If that were the only advantage, I'd only slightly prefer it. But being able to use case as a differentiator in filenames intended for me to read is something I find even more valuable.

A filesystem not being case sensitive isn't a dealbreaker or anything. I just prefer case sensitivity because it increases flexibility and readability for me, and has no downsides that I consider significant.


Also note that 'are these 2 words case insensitively equal' is impossible without knowing what locale rules to apply. And given that people's personal names tend to have the property that any locale rules that must be applied are _the locale that their name originates from_, and that no repository of names I am aware of stores locale along with the name, that means what you want, is impossible.

In line with case insensitivity, do you think `müller` and `muller` should boil down to for example the same username for login purposes?

That's... tricky. In german, the standard way to transliterate names to strict ASCII would be to turn `müller` into `mueller`. In swiss german that is in fact mandatory. Nobody in switserland is named `müller` but you'll find loads of `mueller`s. Except.. there _are_ `müller` in switzerland - probably german citizens living ther.

So, just normalize `ü` to `ue`, easy, right? Except that one doesn't reverse all that well, but that's probably allright. But - no. In other locales, the asciification of `ü` is not `ue`. For example, `Sjögren` is swedish and that transliterates to `sjogren`, not `sjoegren`.

Bringing it back to casing: Given the string `IJSSELMEER`, if I want to title case that, the correct output is presumably `IJsselmeer`. Yes, that's an intentional capital I capital J. Because it's a dutch word and that's how it goes. In an optimal world, there is a separate unicode glyph for the dutch IJ as a single letter so we can stick with the simple rule of 'to title case a string, upper case the first glyph and lowercase all others, until you see a space glyph, in which case, uppercase the next'. But the dutch were using computers fairly early on and went with using the I and the J (plain ascii) for this stuff.

And then we get into well trodden ground: In turkish, there is both a dotted and a dotless i. For... reasons they use plain jane ascii `i` for lowercase dotted i and plain jane ascii `I` for uppercase dotless I. But they have fancy non-ascii unicode glyphs for 'dotted capital I' and 'dotless lowercase i'.

So, __in turkish__, `IZMIR` is not case-insensitive equal to `izmir`. Instead, `İZMIR` and `izmir` are equal.

I don't know how to solve this without either bringing in hard AI (as in, a system that recognizes 'müller' as a common german surname and treats it as equal to 'mueller', but it would not treat `xyzmü` equal to `xyzmue` - and treats IZMIR as not equal to izmir, because it recognizes it as the name of a major turkish city and thus applies turkish locale rules), or decreeing to the internet: "get lost with your fancypants non-US/UKian weird word stuff. Fix your language or something" - which, well, most cultures aren't going to like.

'files are case insensitive' sidesteps alllllll of this.


> you don't say, "Hey! I think you sent me the wrong file!"

You do! Why not?

It's a big trap. A lot of counterfeit, spam, phishing etc go by this method. You end up buying a fake brand or getting tricked.


> Why?

Because it introduces extra complexity.

Now, "Cache" and "cache" are the same, but also...different because you'd care if Cache suddenly became cache.


> Why? Windows is also not case-sensitive, so it's not like there's a near-universal convention that S3 is ignoring.

You can enable case sensitivity for directories or disks, but this is usually done for special cases, like git repos


Yeah, but that little bit of user friendliness ruins the file system for file system things. Now you need “registries” and other, secondary file systems to do file system things because you can’t even use base64 in file names. Make your file browsing app case insensitive, if that’s what you want. Don’t build inferiority down to the core.


Then why don’t you just always write in lower case?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: