C# and Threading: Explicit Synchronization isn't Always Needed

What if you had one single threaded writer updating a variable while there were many concurrent threads reading it, what kind of threading issues would you run into?

The answer is none if you don’t mind the reader threads returning millisecond stale data.

Scenario: When a file changes, a class reads the contents of the new file and updates a reference, meanwhile many concurrent readers are accessing the reference. The file changes extremely infrequently compared to how often the data is needed from it. How can we ensure that the reader threads have the fastest possible access to the data? It is ok if some of the readers return potentially millisecond stale data (think web server).

The answer: no synchronization. Don’t believe me? Read on, the following snippet of code is what I’ll be talking about.

class Program
{
    static void Main(string[] args)
    {
        // The object that encapsulates caching
        // a complex object
        Bar granola = new Bar();
        var actArr = new Action[16];
        for (int i = 0; i < actArr.Length; i++)
        {
            actArr[i] = () =>
                {
                    while (true)
                    {
                        Console.WriteLine("-{0}: {1}",
                            Thread.CurrentThread.ManagedThreadId,
                            granola.Render());
                    }
                };
        }

        // Run the sixteen reader thread and the one
        // writer thread in parallel forever
        Parallel.Invoke(actArr);
        granola.Infinite();
    }
}

class Bar
{
    // The complex object
    private Foo boo;
    private readonly Thread updater;
    private int id = 0;
    public Bar()
    {
        // Initialize the complex object to a
        // starting value
        boo = new Foo(id.ToString());

        // We're going to use a single thread to update the
        // complex object.
        updater = new Thread(() =>
        {
            while (true)
            {
                var val = (id++).ToString();
                boo = new Foo(val);

                // Sleep for a second to represent an eternity 
                // for the computer
                Thread.Sleep(TimeSpan.FromSeconds(1));
            }
        });
        updater.Start();
    }

    public void Infinite()
    {
        updater.Join();
    }

    public string Render()
    {
        // Many threads are here
        return boo.GetCompile();
    }
}

class Foo
{
    private readonly string comp;
    public Foo(string compile)
    {
        // A "complex" operation that makes 
        // caching this object worthwhile
        comp = compile;
    }

    public string GetCompile() { return comp; }
}

The output of this program will vary each time it is ran, but here is a typical output:

-1: 0
-1: 0
-1: 0
-2: 0
-2: 0
[...]
-2: 1
-2: 1
-1: 0
-1: 1

Notice that in the output, thread 2 started outputting “1” and when thread 1 resumed, it initially printed “0”. Since threading is notoriously difficult to get right, let’s dive into potential problems, as maybe I just got lucky with this run, and subsequent runs will cause the program to crash.

So let’s focus on the writer. Since the writer is single threaded, the only potential issue arises when the shared variable is accessed by multiple threads. The writer is trying to assign the variable and the readers are reading from it. Would this ever result in the reader accessing corrupted data caused by a partially assigned variable? Thankfully no, from the C# standard (emphasis mine):

Reads and writes of the following are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types (pp. 163)

We now know that the complex object, Foo owned by Bar will never be in a corrupted state. There is another concern; however, what if the reader threads never see an updated value? In our example, what if the reader threads always see “0” as the result?

Decompiling the program using Ildasm we Bar::Render is:

IL_0000:  ldarg.0
IL_0001:  ldfld      class threadtest.Foo threadtest.Bar::boo
IL_0006:  callvirt   instance string threadtest.Foo::GetCompile()
IL_000b:  ret

Of particular interest are the middle two lines. The line ldfld means that boo is a member of class Bar and is of type Foo. The value of boo is a pointer to Foo and move this pointer onto the stack from memory. The next line simply dereferences the pointer with the specific function. These two lines make a powerful statement. Since the .NET framework has a shared memory model where each thread shares the same heap, we know that when the Foo reference is updated, all threads have “their” Foo updated. “Their” is in quotation marks because they all share the same reference because they all share the same heap, which is where .NET references are located. The one nuance is that a thread could read the reference, store the reference on its local stack, but before it can act on the reference, the operating system switches threads, the writer updates the reference, and another thread picks up the reference and prints the new value. The problem is that the thread with the old reference – it will print the old value because the thread with the old reference is pointing to a different memory location then than the new threads. There is nothing inherently wrong with the situation, the thread simply becomes out of sync for a moment. You don’t have to worry about the old thread containing corrupted data, as .NET ensures that any reference that is held on to by anything, won’t be garbage collected. There’s many situations when this could be bad and a lot of information has been written on the topic of synchronizing threads.

Hopefully, you now know that threads implicitly sync themselves, how, and why as well.

C# and Threading: Explicit Synchronization isn't Always Needed

Comments