The article linked from this plus page (http://exple.tive.org/blarg/2013/10/22/citation-needed/) poses the question "why do programmers start counting at zero?" and then shows contempt towards anyone who thinks they know the answer (while dismissing Dijkstra as "incoherent", without argument).
To the author, the correct way to answer this question is by investigating the history of how things happened to end up the way they are, which (apparently) no one besides him has ever done.
While history is interesting, I think the much more important question is: why ought we to count from zero (or not)? And the answer to that question has nothing to do with history.
In addition to Dijkstra's argument, mentioned elsewhere in this thread, 0-based addressing is more efficient (since it can be used directly as an offset, without having to bias it first).
As much as I like Lua, its 1-based indexing is not my favorite part of that language. In particular, the numeric for loop is confusing, because the loop:
for i=x,y do
-- stuff
end
...will execute (y-x+1) times, not (y-x) as it would in most languages, like Python:
for i in range(x,y):
# stuff
Off-by-one errors are some of the most annoying bugs. 0-based indexing and half-open ranges help avoid them in my experience.
There's another reason that I haven't see anyone bring up. If your native unsigned integer type is the same size as a word (which is common), and a word defines the limits of your address space, and if you have a byte pointer initialized to 0, then 0-based indexing allows you to index into every byte in your address space, whereas 1-based indexing prevents you from indexing the final byte in your address space.
Additionally, assuming an unsigned integer type is used for indexing, 0-based makes every index valid and meaningful, whereas 1-based leaves one index value (0) as invalid.
Erk, no it isn't. If your semantics require an optional value, then encode it that way explicitly. Don't inflict special cases and bugs on everyone else.
Ruby has "nil", and 0. They are not the same, and should not be considered the same.
Would you say that accessing coordinate 0,0 is a "logically invalid operation"? What is an array but a 1 dimensional coordinate system?
Are you suggesting that accessing an array via an uninitialized parameter will show up as a runtime error because of 1-based indexing? That only happens if your language initializes values to 0.
Ruby initializes everything to nil, not 0, and yes, indexing by nil will fail.
Perhaps I'm not seeing your problem or solution because Ruby doesn't even have the kind of problem that is solved by a 1-based array index. See: the Sapir-Whorf Hypothesis http://en.wikipedia.org/wiki/Linguistic_relativity , your thinking may be suffering due to the language you are typically working in.
>Would you say that accessing coordinate 0,0 is a "logically invalid operation"?
If you have a matrix that starts at cell 1,1 then trying to access 0,0 is absolutely a logically invalid operation. I'm not saying that 0-indexing is illogical, I'm saying that dereferencing null/nil is illogical.
As for the rest of your post, the scenario eridius came up with is pretty weird, at this point I'm just going to shrug and give up on analysis.
Would you rather I write "index that is 0 in a world where indexes start at 1" over and over? I am willing to do that if it helps make the conversation smoother.
>much less an operation to "access a null index".
foo_array[0]; //operation to do an access with a null index
>Also, believe it or not, on various machines and OS's it's legal to map address 0.
Which is a bad thing. You get a couple extra variable slots in exchange for thousands of privilege-escalation bugs that could instead be crash bugs.
How in the hell does "accidentally" accessing the first element of an array (by "accidentally" passing 0 to an array in a language that has 0-based indexing) result in "privilege escalation errors"? That's just "shitty programming".
For that matter, how does a 1-based indexing language NOT result in far more "off by one" errors which would include the kind of errors commonly known as "buffer overflow" errors?
You specifically said the native word size and address space. That's going to be a pointer, where a value that causes the CPU to trap is quite handy.
And there's already plenty of invalid index values. -1 for example. There's no way to make an allocation the size of your entire ram, so 0xff...f is going to be invalid.
> That's going to be a pointer, where a value that causes the CPU to trap is quite handy.
You're confusing an invalid pointer with an invalid index. There is absolutely no reason for any index to ever be invalid.
> And there's already plenty of invalid index values. -1 for example.
-1 is not representable with unsigned integers. There are no invalid index values in a 0-based indexing system (using unsigned integers, as I initially stated).
> There's no way to make an allocation the size of your entire ram, so 0xff...f is going to be invalid.
This is so completely off the mark I'm surprised you even came up with it. There are several things wrong with this (off the top of my head):
1. I don't need an allocation the size of my address space in order to want to address both the first and last bytes in the address space.
2. You don't know what machine I'm running on. This could be an embedded 16-bit machine without multiprocessing with 64KB of RAM. Every single addressable byte is valid and corresponds to RAM storage.
3. It doesn't even matter whether or not a given address can be loaded. Indexing a pointer does not necessarily require loading it. As a trivial example, I could take the address of the resulting indexed value.
>You're confusing an invalid pointer with an invalid index. There is absolutely no reason for any index to ever be invalid.
I'm not confusing them. You are very clearly describing a pointer.
>-1 is not representable with unsigned integers.
When I wrote -1 I meant "the value you get when you type -1 in your source code".
>This is so completely off the mark I'm surprised you even came up with it. There are several things wrong with this (off the top of my head):
>1. I don't need an allocation the size of my address space in order to want to address both the first and last bytes in the address space.
That's a pointer issue. If I allocate 20 bytes near the end of ram, my last index is only going to be 19 or 20, so it doesn't matter if I can't go to UINT_MAX.
> 2. You don't know what machine I'm running on. This could be an embedded 16-bit machine without multiprocessing with 64KB of RAM. Every single addressable byte is valid and corresponds to RAM storage.
Again, that's a pointer.
> 3. It doesn't even matter whether or not a given address can be loaded. Indexing a pointer does not necessarily require loading it. As a trivial example, I could take the address of the resulting indexed value.
The article you link to is interesting with a lot of research, but it is rather hostile and reaches a strange conclusion. The article quotes the creator of BCPL (precursor language to C): "if p is a pointer [...] Naturally p+0 has the same value as p. [... discussion of the indirection operator ...] I can see no sensible reason why the first element of a BCPL array should have subscript one." So the explanation from the creator of BCPL is basically what you'd expect, that 0-based is the natural thing to do if you're working with machine-level pointers.
But then the article goes off into compiler optimization, a total misunderstanding of time-sharing, and a misunderstanding of how indexing works and concludes that in BCPL arrays started at 0 to minimize compile time to avoid job preemption, and that's why many languages today use 0-based arrays. The article is still worth reading, though.
VBScript is 0 based for some stuff, but 1-based for slicing. Drove me nuts for a day until I figured it out inspite of the criptic error message. Hate VBScript for this reason alone.
One thing to remember is that Lua arrays are 1-based by convention, not requirement, since they are really just tables indexed by integers. So you can have zero-based arrays in Lua, but you'll be making life difficult for yourself as the convention, the libraries, and array literals will all want to give you 1-based arrays.
...and if you're using LuaJIT, 0-based arrays will even be optimized appropriately (i.e., array[0] will be stored internally in the array rather than in the hash).
But you'll still be fighting with the libraries and array literals, as you say.
To the author, the correct way to answer this question is by investigating the history of how things happened to end up the way they are, which (apparently) no one besides him has ever done.
While history is interesting, I think the much more important question is: why ought we to count from zero (or not)? And the answer to that question has nothing to do with history.
In addition to Dijkstra's argument, mentioned elsewhere in this thread, 0-based addressing is more efficient (since it can be used directly as an offset, without having to bias it first).
As much as I like Lua, its 1-based indexing is not my favorite part of that language. In particular, the numeric for loop is confusing, because the loop:
...will execute (y-x+1) times, not (y-x) as it would in most languages, like Python: Off-by-one errors are some of the most annoying bugs. 0-based indexing and half-open ranges help avoid them in my experience.