Friday 27 July 2012

Why do CFML arrays start at 1 instead of 0?

A few weeks ago whilst investigating something completely unrelated, I came across an interesting thread on StackOverflow, in which someone had asked why ColdFusion arrays start at index 1 rather than index 0 like other languages do.

This was an old thread, but my thoughts on it were a bit different from the others thusfar offered, so I wrote up my own point of view. Whilst writing it I started to think "I wish I had a blog for this sort of thing, rather than posting it on an old thread on StackOverflow"... and it was that thought that culminated in me finally starting this blog.

Anyway, I realise it's a bit of a duplication, but I'm "reclaiming" the post for myself. Here it is.

As a different spin on it, let's ask why in some languages the array index starts at zero?  For counting discrete objects (like array elements), this makes little sense and is not natural from a human perspective.

This originally seemed to stem from languages like C (although I'm not suggesting it first arose in C: I don't know, and it doesn't matter for the purposes of this) in which the language and its programming is rather closely coupled to memory management (malloc, etc).  Some C language conceits map rather closely to what's going in in memory under the hood.  Variables are an example of this: as well as variable names, we're always busying ourselves with the memory address the variable is at (or
starts at) with pointers and the like.

So we come to arrays in C, and those are indexed in such a way that there's a series of elements which reside in memory, starting at the base memory location of the array variable, and each element is offset by the size of the data type (eg: a char is one byte, etc). So to find each element in the array in memory, we do this:
arrayBaseAddress + (whichElementItIsInTheArray * sizeOfDataType)

And one really does actually find oneself thinking like this when doing stuff in C, because it maps rather closely to what the computer has to do under the hood to find the value the code wants.

So the whichElementItIsInTheArray is used to offset the memory address (in units of  sizeOfDataType).

Obviously if one starts the array index at 1, it would be offset in memory by one `sizeOfDataType`, for all intents and purposes wasting a sizeOfDataType amount of memory between the arrayBaseAddress and where the first element actually resides.

One might think that this hardly matters, but in days of yore when all this was being implemented, memory was like gold: it could not be wasted like that.  So one might think "OK, well just offset whichElementItIsInTheArray by -1 under the hood, and be done with it.  However like memory, clock cycles were gold, so instead of wasting processing, the idea was the programmer would just need to get used to an unnatural way of counting.

So there
was a legitimate reason to start arrays at index zero in these situations.

It seems to me (and this is getting into editorial slant now) when subsequent "curly braces" languages came out (like Java) they simply followed suit whether it was really relevant still or not, because "that's the way it's done".  Rather than "that way makes sense".

On the other hand, more modern languages, and ones further removed from the inner workings of the computer, someone stopped to think "why are we doing this?", and "in the context of this language and its intended uses, does this make sense?".  I agree here than the answer is - firmly - "no".  The resource wastage to offset the array index by -1, or simply just ignore the zeroth element's memory is no longer a relevant consideration in a lot of circumstances.  So why make the language and the programmer have to offset the way they naturally count things by one, for a purely legacy reason?  There is no legitimate reason to do so.

In C, there is an element of an array a[0].  This is the
first element of the array (not the "zeroth" element [; there is no such thing as a "zeroth" element]), and if that's the full extent of the array, its length is one.  So the idiosyncratic behaviour here is on the part of the programming language, not on the part of the way things are counted / enumerated "in real life" (which is where most of us reside).  So why persist with it?

Some people here have countered this "where to start the index" argument with "well when we're born, we're not ONE, we're ZERO".  This is true, but that's measuring a continuous thing, and is not the same.  So is irrelevant to the conversation.  An array is a collection of discrete items, and when measuring the quantity of discrete items (ie: counting them), we start at one.

How this adds to the conversation?  Well it doesn't much, but it's a different way of looking at the same thing. And I suppose it's a bit of a rationalisation / reaction to this notion some people have that starting array indexes at 1 is somehow "wrong".  It's not wrong, from a human perspective it's more right than starting them at zero.  So let the human write the code like a human would, and get the machine to make sense of it as needs must.  Basically it's only for legacy technological limitations that we ever started counting them from zero in the first place, and there's no need to perpetuate that practice if we no longer need to.

One more topical thing to mention here... Is this the sort of "cross-compatibility" thing that annoys people about CFML? People seem to get tripped-up by other "similar but slightly different" differences between CFML and other languages they switch between, so this might be another one of those.

For my part I think it's fine, and more sensible the way it is anyhow.

The Olympics kick off today down the road a bit from home.  I've survived my first day's commute across East London, through the Olympic Park (South Woodford to Chancery Lane on the Central Line, should you care), and it wasn't too bad, indeed given I got a seat today, it was actually better than it is 90% of the time.  According to "The Coffee Lady" at South Woodford Station, the place was crazy busy @ 5:30am though, for the first train.  I'm not looking forward to my commutes for the next three weeks or however long it goes on for.  But here I am at work, so time to press "send"...