Why? Windows is also not case-sensitive, so it's not like there's a near-universal convention that S3 is ignoring.
Case sensitivity in file names is surprising even to non-technical people. If someone says they sent you "Book Draft 1.docx" and you check your email to find "Book draft 1.docx," you don't say, "Hey! I think you sent me the wrong file!"
Casing is usually not meaningful even in written language. "Hi, how are you?" means the same thing as "hi, how are you?" Uppercase changes meaning only when distinguishing between proper and common nouns, which is rarely a concern we have with file names anyway.
> If someone says they sent you "Book Draft 1.docx" and you check your email to find "Book draft 1.docx," you don't say, "Hey! I think you sent me the wrong file!"
But you also wouldn't say that if they sent "Book - Draft 1.docx", "Book Draft I.docx", "BookDraft1.docx", "Book_Draft_1.docx", or "Book Draft 1.doc", and surely you wouldn't want a filesystem to treat all of them as the same.
This is a personal reason, but the reason I prefer case sensitive directory names is I can make "logical groupings" for things. So, my python git directory might have "Projects/" and "Packages/," and the capitalization not only makes them stand out as a sort of "root path" for whatever's underneath, but the capitalization makes me conscious of the commands I'm typing with that path. I can't just autopilot a path name, I have to consciously hit shift when tab completion stops working.
That might sound like a dumb reason, but it's kept me from moving things into the wrong directory, or accidentally removing a directory multiple times in the past.
I also use Windows regularly and it really isn't a hindrance, so maybe I wouldn't actually be bothered if everything was case sensitive.
To me, this sounds like a great practice for terminal environments but may be less intuitive when using file system apps. I could easily overlook a single letter capitalization in a GUI view of many directories. Maybe it's because at a terminal the "view" into the file system is narrow?
Now I'm wondering how I can use this in my docker images. I mean that might irritate devops. Well, maybe they'll like it too. Man, thanks for posting this.
You have to draw the line somewhere, but I do appreciate when the UI sorts "Book draft 2" before "Book draft 11". That requires nontrivial tokenization logic and inference, but simple heuristics can be right often enough to be useful.
On that note, ASCIIbetical sort is never the right answer. There is a special place in hell for any human-facing UI that sorts "Zook draft 1" between "Book draft 1" and "book draft 1".
And that line, at least for sorting, belongs firmly outside the filesystem.
Sorting is locale-dependent. Whether a letter-with-dots sorts next to letter-without-dots or somewhere completely different has no correct global answer.
I think there's a pretty big difference between how the UI orders things and how the filesystem treats things as equivalent. A filesystem treating names case sensitively doesn't prevent the UI from tokenizing the names in any other arbitrary way
There are just not the same characters. A filesystem should not have an opinion on what strings of characters _mean_ the same. It is the wrong level of abstraction.
filenames might even not be words at all, and surely not limited to English. We shouldn't implement rules and conventions from spoken English at a filesystem level, certainly not S3.
I think what they mean is if you somehow had two files with the same name but different cases (as NTFS supports this) it would be impossible to fix with win32 calls
No, NTFS has always been at least optionally case sensitive; current Windows versions even allow case-sensitivity to be controlled on a per-directory basis[1], which even works for (some) Win32 programs:
Microsoft Windows [Version 10.0.22631.3593]
(c) Microsoft Corporation. All rights reserved.
C:\Users\jtm>mkdir foo
C:\Users\jtm>fsutil file setCaseSensitiveInfo foo
Case sensitive attribute on directory C:\Users\jtm\foo is enabled.
C:\Users\jtm>echo bar > foo\bar.txt
C:\Users\jtm>echo Bar > foo\Bar.txt
C:\Users\jtm>dir foo
Volume in drive C is Aristotle-Win
Volume Serial Number is E4AE-428B
Directory of C:\Users\jtm\foo
2024-05-31 17:55 <DIR> .
2024-05-31 17:55 <DIR> ..
2024-05-31 17:55 6 Bar.txt
2024-05-31 17:55 6 bar.txt
2 File(s) 12 bytes
2 Dir(s) 41,524,133,888 bytes free
C:\Users\jtm>type foo\bar.txt
bar
C:\Users\jtm>type foo\Bar.txt
Bar
And so should we be able to have “é.txt” and “é.txt” in the same directory (with a different UTF-8 normalization?)
What encoding should we use BTW?
I’m not advocating for case-insensitive fs (literally the first thing I do when I get a Mac is reformat it to be on a case-sensitive fs), but things are not that simple either.
> And so should we be able to have “é.txt” and “é.txt” in the same directory
That's what Linux does.
It does create some problems that seem to never happen on practice, while it avoids some problems that seem to happen once in a while. So yeah, I'd say it's a good idea.
You look from technical perspective. From average person perspective, even files are too much technicality to deal with.
As a user I want my work to be preserved, I want to view my photos and I want system to know where is my funny foto of my dog I did last Christmas.
As a developer I need an identifier for a resource and I am not going to let user decide on the Id of the resource, I put files in system as GUID and keep whatever user feels as metadata.
Exposing average people to the filesystem is wrong level of abstraction. That is why iOS and Android apps are going that way - but as I myself am used to dealing with files it annoys me that I cannot have that level of control, but I accept that I am quite technical.
Dealing with files used to be something everyone interacting with computers had to do. It is something average people can do.
I think too much abstraction is a mistake and adds a lot of unneeded complexity.
People should learn something about technology they use. If you want to drive, you need understand how steering wheels work, if you want to drive a manual car (usual where I live and have lived) then you need to know how to work a gear stick and the effect of changing gear.
I'm not even sure 'everyone with an office job' had a computer. It certainly wasn't true 35 years ago. An office might have a computer or two, but not everyone had one, nor was everyone expected to use it.
Case insensitive matching is a surprisingly complicated, locale-dependent affair.
Should I.txt and i.txt match? (Note that the first file is not named I.txt).
Case insensitive filesystems make about as much sense as ASCII-only filenames.
You don't need to decide how to upper or lower case a character to be insensitive to case, though. Treating them all as matching isn't a terrible option.
And yet case insensitive file name matching / string matching is one of my favourite windows features. It’s enormously convenient. An order of magnitude more convenient than the edge cases it causes me.
People aren’t ASCII or UTF-8 machines; “e” and “E” are the same character, that they are different ASCII codes is a behind the scenes implementation detail.
(That said, S3 isn’t a filesystem, it’s more like a web hashtable key-to-blob storage)
> People aren’t ASCII or UTF-8 machines; “e” and “E” are the same character
They are the same character to you, a native speaker of a Western language written in a latin script. They are the same to you because you are, in fact, an ASCII machine. Many many people in the world are not.
They are the same to me, they are different in ASCII, therefore I am not an ASCII machine. To me, the person using the computer to do work. Not the person wanting to do extra work to support the computer's internal leaky abstractions of data storage.
Your position, the position of too many people, is that I a native speaker of English etc. should not be allowed to have a computer working how English works because somewhere, someone else is different. This is like saying I shouldn't be allowed an English spell checker because there are other people who speak other languages.
Are the words hello and HELLO spelled differently? I am pretty squarely in the camp that filesystems should be case sensitive (perhaps with an insensitive shell on top), but I would not consider those two words as having a different spelling. To me that means they are the same sequence of characters.
And you seem to be conflating characters and letters. There are fewer letters in the standard alphabet than we have characters for the same, largely because we do distinguish between some letter forms.
I suppose you could imagine a world where we don't, in fact, do this with just the character code. Seems fairly different from where we are, though?
When you press the "E" key on a US keyboard and "e" comes out, do you return the keyboard because it's broken? If not, then you know what definition I'm using even if I misnamed it.
Every single time I type a path or filename (or server name) in the shell, or in Windows explorer, or in a file -> open or save dialog, I don't trip over capitalization. If I want to glob files with an 'ecks' in the name I can write *x* and not have to do it twice for *x* and *X*.
When I look at a directory listing and it has "XF86Config", I read it in my head as "ecks eff eight six config" not "caps X caps F num eight num six initial cap Config" and I can type what I read and don't have to double-check if it's config or Config.
Tab completion works if I type x<tab> instead of blanking on me and making me double check and type X<tab>.
Case sensitivity is like walking down a corridor and someone hitting you to a stop every few steps and saying "you're walking Left Right Left Right but you should be walking Right Left Right Left".
Case insensitivity is like walking down a corridor.
In PowerShell, some cmdlets are named like Add-VpnConnection where the initialism drops to lowercase after the first letter, others like Get-VMCheckpoint where the initialism stays capitalised, others mixed like Add-NetIPHttpsCertBinding where IP is caps but HTTPS isn't - any capitalisation works for running them or searching them with get-command or tab-completing them. I don't have to care. I don't have to memorise it, type it, pay attention to it, trip over it, I don't have to care!.
"A programming language is low level when its programs require attention to the irrelevant." - Alan Perlis.
DNS names - ping GOOGLE.COM works, HTTPS://NEWS.YCOMBINATOR.COM works in a browser, MAC addresses are rendered with caps or lowercase hex on different devices, so are IPv6 addresses in hex format, email addresses - firstname.lastname or Firstname.Lastname is likely to work. File and directory access behaving the same means it's less bother. In Vim I :set ignorecase.
In PowerShell even string equality check is case insensitive by default, string match and split too. When I'm doing something like searching a log I want to see the english word 'error' if it's 'error' or 'ERROR' or 'Error' and I don't know what it is.
If I say the name of a document to a person I don't spell out the capitalisation. I don't want to have to do that to the computer, especially because there is almost no reason to have "Internal site 2 Network Diagram" and "INTERNAL site 2 network diagram" and "internal site 2 NETWORK DIAGRAM" in the same folder (and if there were, I couldn't easily keep them apart in my head).
All the time in command prompt shell, I press shift less often, type less, change directories and work with files more smoothly with less tripping over hurdles and being forced to stop and doublecheck what I'm tripping over when I read "word" and typed "word" and it didn't work.
On the other hand, the edge cases it causes me are ... well, I can't think of any because I don't want to put many files differing only by case in one directory. Maybe uncompressing an archive which has two files which clash? I can't remember that happening. Maybe moving a script to a case sensitive system? I don't do that often. In PowerShell, method calls are case insensitive. C# has "string".StartsWith() and JavaScript has .startsWith() and PowerShell will take .startswith() or .StartsWith or .Startswith or anything else. That occasionally clashes if there's a class with the same name in different case but that's rare, even.
In short, the computer pays attention to trivia so I don't have to. That's the right way round. It's about the best/simplest implementation of Do What I Mean (DWIM) that's almost always correct and almost never wrong.
> Both options are independent of file system case-sensitivity.
In Windows world it works everywhere, in any win32 program - file open dialogs, et al. Here you have to have it built in to every tool. (and windows doesn't do it at the filesystem layer)
None of these are the filesystem though, they are all abstractions over the file system that could easily implement case insensitivity, and as a sibling comment pointed out, actually do in many cases. I'm perfectly fine with the idea of interacting with files using a case insensitive interface. I just don't feel like it should be the job of the filesystem to enforce case insensitivity.
Case Preserving and Case Sensitive are subtly two different things. Most case insensitive file systems are case preserving and whatever the UTF8 equivalent is I forget the name.
heh, I especially enjoy that in a huge thread about how capitalization does and doesn't matter, "gps point" was not, in fact, concerning some coordinates of the global positioning system but rather "GP's point". I first chalked it up to some autocomplete artifact but then realized what was actually happening
> Casing is usually not meaningful even in written language. "Hi, how are you?" means the same thing as "hi, how are you?" Uppercase changes meaning only when distinguishing between proper and common nouns, which is rarely a concern we have with file names anyway.
The number of spaces is usually not meaningful in written language. "Hi, how are you?" means the same thing as "Hi, how are you ?". I don't think it's a good reason to make file system ignore space characters.
No offense, but I think that's a very western-centric view. Your example only make sense when the user is familiar to English (or other western languages, I guess). To me personally, I find it strange that "D.txt" and "d.txt" means the same file, since they are two very different characters. Likewise, I think you would also go crazy if I tell you "ア.txt" and "あ.txt" means the same file (which is hiragana and katakana for A respectively, which in a sense is equivalent to uppercase and lowercase in Japanese), or "一.txt" and "壹.txt" means the same file (which both means number 1 in Chinese, we call the latter one literally "uppercase number")
Agreed, and you could even take this into "1.txt" being the same as "One.txt". Which, I mean, fair that I would expect a speech system to find either if I speak "One dot t x t". But, it would also find "Won.txt" and trying to bridge the phonetic to the symbolic is going to be obviously fraught with trouble.
What if Unicode updates some capitalization rules in the next version, and after an OS updates some filenames now collide and one of the is inaccessible?
If someone says they sent you "Book Draft 1.docx" and you check your email to find "Ⓑⓞⓞⓚ Ⓓⓡⓐⓕⓣ ①.ⓓⓞⓒⓧ", "฿ØØ₭ ĐⱤ₳₣₮ 1.ĐØ₵Ӿ" - these are different files.
Ages ago on Flowdock at work (a chat webapp kind of like Slack that no longer exists), I used the circle ones for a short time as my nickname, and no one could @ me.
File systems are not user interfaces. They are interfaces between programs and storage. Case insensitive is much better for programs.
The user shell can choose however it wants to handle file names, a case sensitive file system does not prevent the shell from handling file names case insensitively.
> Why? Windows is also not case-sensitive, so it's not like there's a near-universal convention that S3 is ignoring.
Not sure why what Windows does is relevant to this, honestly. Personally, I strongly prefer case sensitivity with filenames, but the lack of it isn't a dealbreaker or anything.
What are some of the advantages of case sensitivity? Are you saying you actually want to save "Book draft 1.docx" and "Book Draft 1.docx" as two separate files? That just sounds like asking for trouble.
The advantages that I value are that case sensitivity means I can use shorter filenames, it makes it easier to generate programmatic filenames, and I can use case to help in organizing my files.
> Are you saying you actually want to save "Book draft 1.docx" and "Book Draft 1.docx" as two separate files?
That's a situation where sensitivity can cause difficulty, yes, but for me personally, that's a minor confusion that is easy to avoid or correct. Everything is a tradeoff, and for me, putting up with that annoyance is well worth the benefits of case sensitivity.
I do totally understand that others will have different tradeoffs that fit them better. I'm not taking away from that at all. But saying "case sensitivity is undesirable" in a broad sense is no more accurate than saying "case sensitivity is desirable" in a broad sense.
Personally, I think the ideal tradeoff is for the filesystem to be case sensitive, but have the user interfaces to that file system be able to make everything behave as case-insensitive if that's what the user prefers.
Unicode case folding is a complicated algorithm, and its definition is subject to change with updated Unicode versions. It's nice not to have to worry about that.
Okay, but I don't think this has anything to do with the use case JohnFen mentioned or my questions about it.
If your goal is super easy filename generation then you're probably not going to leave ASCII.
And if you do go beyond ASCII for filename packing/generating, then you should instead use many thousands of CJK characters that don't have any concept of case at all. Bypass the question of case sensitivity entirely.
Enough that I prefer it. If that were the only advantage, I'd only slightly prefer it. But being able to use case as a differentiator in filenames intended for me to read is something I find even more valuable.
A filesystem not being case sensitive isn't a dealbreaker or anything. I just prefer case sensitivity because it increases flexibility and readability for me, and has no downsides that I consider significant.
Also note that 'are these 2 words case insensitively equal' is impossible without knowing what locale rules to apply. And given that people's personal names tend to have the property that any locale rules that must be applied are _the locale that their name originates from_, and that no repository of names I am aware of stores locale along with the name, that means what you want, is impossible.
In line with case insensitivity, do you think `müller` and `muller` should boil down to for example the same username for login purposes?
That's... tricky. In german, the standard way to transliterate names to strict ASCII would be to turn `müller` into `mueller`. In swiss german that is in fact mandatory. Nobody in switserland is named `müller` but you'll find loads of `mueller`s. Except.. there _are_ `müller` in switzerland - probably german citizens living ther.
So, just normalize `ü` to `ue`, easy, right? Except that one doesn't reverse all that well, but that's probably allright. But - no. In other locales, the asciification of `ü` is not `ue`. For example, `Sjögren` is swedish and that transliterates to `sjogren`, not `sjoegren`.
Bringing it back to casing: Given the string `IJSSELMEER`, if I want to title case that, the correct output is presumably `IJsselmeer`. Yes, that's an intentional capital I capital J. Because it's a dutch word and that's how it goes. In an optimal world, there is a separate unicode glyph for the dutch IJ as a single letter so we can stick with the simple rule of 'to title case a string, upper case the first glyph and lowercase all others, until you see a space glyph, in which case, uppercase the next'. But the dutch were using computers fairly early on and went with using the I and the J (plain ascii) for this stuff.
And then we get into well trodden ground: In turkish, there is both a dotted and a dotless i. For... reasons they use plain jane ascii `i` for lowercase dotted i and plain jane ascii `I` for uppercase dotless I. But they have fancy non-ascii unicode glyphs for 'dotted capital I' and 'dotless lowercase i'.
So, __in turkish__, `IZMIR` is not case-insensitive equal to `izmir`. Instead, `İZMIR` and `izmir` are equal.
I don't know how to solve this without either bringing in hard AI (as in, a system that recognizes 'müller' as a common german surname and treats it as equal to 'mueller', but it would not treat `xyzmü` equal to `xyzmue` - and treats IZMIR as not equal to izmir, because it recognizes it as the name of a major turkish city and thus applies turkish locale rules), or decreeing to the internet: "get lost with your fancypants non-US/UKian weird word stuff. Fix your language or something" - which, well, most cultures aren't going to like.
'files are case insensitive' sidesteps alllllll of this.
Yeah, but that little bit of user friendliness ruins the file system for file system things. Now you need “registries” and other, secondary file systems to do file system things because you can’t even use base64 in file names. Make your file browsing app case insensitive, if that’s what you want. Don’t build inferiority down to the core.
Why? Windows is also not case-sensitive, so it's not like there's a near-universal convention that S3 is ignoring.
Case sensitivity in file names is surprising even to non-technical people. If someone says they sent you "Book Draft 1.docx" and you check your email to find "Book draft 1.docx," you don't say, "Hey! I think you sent me the wrong file!"
Casing is usually not meaningful even in written language. "Hi, how are you?" means the same thing as "hi, how are you?" Uppercase changes meaning only when distinguishing between proper and common nouns, which is rarely a concern we have with file names anyway.