Hacker News new | past | comments | ask | show | jobs | submit login

The origins of our current commonly accepted indexing semantics are one thing. The question of which scheme is better is another thing entirely, and it's been settled quite definitely.

There are myriad reasons why zero-based indexing is the correct approach. Djikstra's commentary on the subject gets to the heart of the issue, but let's take the roundabout route.

Consider languages with pointers and pointer arithmetic. In these languages, anything outside of zero-based indexing is confusing and error-prone. The equivalence:

  p[i] is *(p + i)
Only holds with zero based indexing, and it is a crucial property to maintain. The crucial understanding of |i| in this context is that it's not simply a number, but that it is a _vector_ from one location to another. The nullary vector (0) |p| takes you to |p|, or |p == &p[0]|. To get the vector between two pointers, we simply have to subtract the origin from the destination:

  v(p, q) = q - p
So:

  v(p, &p[i]) = &p[i] - p = (p + i) - p == i
These vectors fall into a natural algebraic relationship. Consider two indexes |i| and |j|, and the vectors between |p|, |&p[i]| and |&p[j]|. To help with readability, I'll alias |pi = &p[i]| and |pj = &p[j]|.

The vector from |p| to |pi| is |i|. The vector from |p| to |pj| is |j|. The vector from |pi| to |pj| is:

  v(pi, pj) = pj - pi = (p + j) - (p + i) = j - i
Plug this vector back in as an index from |pi|, and we get back to |pj| naturally:

  pi[v(pi, pj)] = (pi + v(pi, pj)) = (p + i) + (j - i) = p + j = pj

With one-based indexing, this algrebraic consistency breaks down. I'll use caps to distinguish the one-based indexing formulas from the zero-based indexing formulas. In the one-based scheme, |&P[I] == (P + (I - 1))|. Following from that:

  V(PI, PJ) = PJ - PI = (P + (J - 1)) - (P + (I - 1)) = (J - 1) - (I - 1) = J - I
So far so good. The vector calculation yields the same number! Great! Now let's plug it back in:

  PI[V(PI, PJ)] = (PI + (V(PI, PJ) - 1)) = (P + (I - 1)) + ((J - I) - 1) = P + J - 2 = PJ - 1
Nope! Indexing from |PI| with the vector from |PI| to |PJ| no longer takes you to |PJ|. Rather, it takes you to one element _before_ |PJ|.

The consistent, clear relationship between pointers and the vectors between them no longer holds. Vectors no longer compose. Composing two vectors no longer works, as the |-1| term compounds over each composition.

This is no small thing to lose. Algebras are extremely powerful ways of coherently describing the behaviour of systems. Losing these algebras means you lose their powerful regularity. Adding two vectors no longer yields the true composite vector. Multiplying a vector by a scalar no longer yields a scaled vector. At every vector addition or scaling, the programmer will be forced to adjust their result vectors to obtain the true composite vector. This is an enormous burden on the programmer.

Any programming language that imposes this burden on the programmer is simply wrong. We make languages so that it's easier for us, people, to reason about the behaviour of our programs. Destroying the algebraic relationship between pointers and the vectors between them is inexcusable. We may have experimented it during the early years of language development, but we know better now.

The above is just _one_ reason why zero-based indexing is, without a doubt, the correct approach. The other major reason has to do with the superiority of open intervals over that of closed intervals, and of the algebras over those intervals. I might write another comment on that if I get time, but the gist of it is the following (this is the thrust of Djikstra's argument in favour of zero-based indexing):

Zero-based indexing supports a notion of integer intervals that fall into a coherent algebraic framework of open intervals.

Adding to Dijkstra's argument, this notion of an algebra over open intervals is powerful because not only dose it capture integer intervals (array and pointer arithmetic in C and C++), but also can be extended to ranges over a variety of ordered containers (e.g. collections of arbitrary values), and of ordered infinite sets (e.g. the abstract set of all strings, or all JSON values, etc.). The most prominent implementation of this is C++'s STL ranges (there's a very good reason |container.end()| in C++ returns an iterator "past" the last item in the collection). Suffice it to say that for algebras over intervals, open intervals win over closed intervals.

However, I don't have time to write that novel right now ;)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: