C# For Loops and Iterators Uncovered

int[] arr = { 1, 2, 3, 4 };

for (int i = 0; i < arr.Length; i++)
   Console.WriteLine(arr[i]);
    
foreach (int i in arr)
    Console.WriteLine(i);

for (int i = 0; i < arr.Count(); i++)
    Console.WriteLine(arr.ElementAt(i));

What are the differences between these loops? They look the same and have the same result. The truth is that they are extremely different and serve different purposes. Iterating through a collection with a for loop implicitly implies that the collection has all of its elements sequentially laid out in memory. While it is technically possible to iterate over an IEnumerable with a for loop using Count and ElementAt, it is not recommended. If the underlying collection doesn’t implement ICollection then Count is O(n) and if IList isn’t implemented, ElementAt is O(n). This means that just iterating through the all the elements would be O(n²), and any algorithm that is traditionally O(n²) like BubbleSort would be O(n⁴)! Not to mention, one could be iterating on undefined behavior. For instance, what does it mean if one wants the second index of a Hashset, Dictionary, Circular Linked List, etc? With this in mind, discussion will be restricted to collections that implement IList when talking about for loop iteration.

There are many advantages to this restriction, as the IList interface provides maximum functionality. To see this functionality, the foreach loop must be examined. Whenever the compiler encounters a foreach (V value in x) it is expanded to the equivalent of:

E enumerator = ((C)(x)).GetEnumerator();
try {
    V value;
    while (enumerator.MoveNext()) {
        value = (V)(T)enumerator.Current;
        embedded-statement
    }
}
finally {
    // Dispose enumerator
}

Keys

E
Known as the enumerator type. Normally IEnumerator or its generic variant.
C
Known as the collection type. Normally IEnumerable or its generic variant.
T
Known as the element type. This is the type of the elements in the collection.
x
Known as the collection variable. It’s the collection being iterated over.
V
Known as the local-variable-type. Has to be explicitly convertible from T. For instance, collection (x) of a derived type (T) iterated over using the base class (V).

There was a lot of explicit casting in the previous code example, and those familiar with LINQ know of the extension methods OfType and Cast. Though similar, none of the following statements are equivalent:

foreach (V value in x) { }
foreach (V value in x.OfType<U>()) { }
foreach (V value in x.Cast<U>()) { }
// for demonstration purposes U = V

The main difference is that OfType and Cast work with the nongeneric IEnumerable, whereas regular foreach loops utilize whatever is returned from x.GetEnumerator() as was shown in the previous code sample. This difference only becomes apparent when V is a value type and the underlying collection is composed of a value type that is implicitly convertible to V (eg. int[] and long). For the following examples imagine iterating through an int array and working with a long.

The first foreach will run successfully, transferring the int’s value to the long and iterating through all the elements, OfType won’t iterate any of the elements, and Cast will throw an InvalidCastException. What’s going on is that when OfType and Cast are being executed, they are being asked to create an IEnumerable<long> out of an IEnumerable, which means that the underlying ints are implicitly boxed into objects and then unboxed out as longs. However, [a]n unboxing operation consists of first checking that the object instance is a boxed value of the given value-type, and then copying the value out of the instance. Thus since an int is not a long, a boxed int cannot be unboxed into a long and so the exception is thrown.

Back on track, it may have not been immediately apparent but the foreach iteration variable is read-only. Before I explain why it’s read-only let’s recap and do a quick comparison.

For loop advantages:
Change element values via indexer
Access elements in any order (backwards traversal is possible)
Can cope with underlying collection changes (see below for more)
Foreach loop advantages:
Ability to iterate over all collections
Considered more ‘readable’ according to Chris Sells (The C# programming language, page 424).
For loop disadvantages:
Can only be feasibly used when elements are laid out sequentially in memory
Foreach loop disadvantages:
Read-only element access

Why are elements in a foreach loop read-only? It’s simple. Imagine a spectrum of most functionality to most flexible. IList would be at the most functional end and IEnumerable would be at the most flexible with ICollection somewhere in between. Since foreach is such a high level statement, it can’t and won’t assume anything about the container it’s iterating over. What if it was iterating over a binary tree and every element iterated over was randomly assigned values? If foreach didn’t force read-only access then the binary tree would degenerate into a tree. The entire data structure would be in disarray. Therefore it is easiest to define an enumerator as a forward-only read-only iterator.

What happens to the iteration if the underlying collections change? In many cases the for loop is able to cope with whatever changes as it is relies only on the indices of the list. As long as the list exists, the current index is in range, and there isn’t a threading issue, then there isn’t a problem. The foreach loop is the problem child here. There are two schools of thought of what should happen to an iterator if the collection it is iterating over changes. The first school of thought is that if the underlying collection could have been modified, invalidate all associated iterators. The other philosophy is that an iterator should always be valid and have a defined response for actions like elements being added or removed. The Gang of Four highlight the design decision:

It can be dangerous to modify an aggregate while you’re traversing it. If elements are added or deleted from the aggregate, you might end up accessing an element twice or missing it completely. A simple solution is to copy the aggregate and traverse the copy, but that’s too expensive to do in general. (pp. 261)

So what school of thought is applied to the Base Class Library (BCL)? Well it depends on what collection you’re talking about. Every collection except the C# 4.0 concurrent collections invalidates the enumerator if the collection is changed*. I agree that the iterators should be invalidated because if they weren’t, then one could get mix of old and new material (or skip some entirely) depending on the container and how it is implemented. What’s interesting about .NET enumerators is the first call to MoveNext ‘initializes the parameters (including this) of the iterator block to the argument values and instance value saved when the enumerator object was initialized’ (The C# programming language, page 594). So if an enumerator is constructed, it’s not usable until MoveNext is called, which is why enumerators are always used in conjunction with while and not do-while loops. Despite this pseudo-lazy initialization if the underlying collection is changed before the first call to MoveNext, the enumerator is still invalidated as shown in this stackoverflow question I haven’t found a good reason why the BCL team chose to do this. They could have easily of spun off a new iterator to replace the old and all functionality would still be intact. They probably decided to adhere to GoF in the strictest sense in that if the collection has changed, any and all iterators, whether initialized or not, are invalid. Is this then a design flaw in IEnumerator philosophy? Jon Skeet seems to think so (from the C# programming language book, page 593):

The fact that none of the code within an iterator block runs until the first call to MoveNext() is irritating. It means that if you need to check any parameter values, you should really use two methods: a normal method that performs the appropriate validation and calls the second method, which is then implemented with an iterator block. Ideally, some construct would be available to indicate a section of the iterator block that should be executed immediately, before constructing the state machine

Why aren’t concurrent collections’ enumerators invalidated when the underlying collection changes? Because I haven’t used concurrent collections extensively, my only example is a message queue. While processing a message, more messages could be added to the queue. If the enumerator was invalid, one could only add messages once the current message finished.

Don’t worry if you scratched your head in confusion, I know I did.

Methods that invalidate a List’s iterators

Comments

If you'd like to leave a comment, please email [email protected]