C# and Threading: Explicit Synchronization isn't Always Needed
Published on:Table of Contents
What if you had one single threaded writer updating a variable while there were many concurrent threads reading it, what kind of threading issues would you run into?
The answer is none if you don’t mind the reader threads returning millisecond stale data.
Scenario: When a file changes, a class reads the contents of the new file and updates a reference, meanwhile many concurrent readers are accessing the reference. The file changes extremely infrequently compared to how often the data is needed from it. How can we ensure that the reader threads have the fastest possible access to the data? It is ok if some of the readers return potentially millisecond stale data (think web server).
The answer: no synchronization. Don’t believe me? Read on, the following snippet of code is what I’ll be talking about.
class Program
{
static void Main(string[] args)
{
// The object that encapsulates caching
// a complex object
Bar granola = new Bar();
var actArr = new Action[16];
for (int i = 0; i < actArr.Length; i++)
{
actArr[i] = () =>
{
while (true)
{
Console.WriteLine("-{0}: {1}",
Thread.CurrentThread.ManagedThreadId,
granola.Render());
}
};
}
// Run the sixteen reader thread and the one
// writer thread in parallel forever
Parallel.Invoke(actArr);
granola.Infinite();
}
}
class Bar
{
// The complex object
private Foo boo;
private readonly Thread updater;
private int id = 0;
public Bar()
{
// Initialize the complex object to a
// starting value
boo = new Foo(id.ToString());
// We're going to use a single thread to update the
// complex object.
updater = new Thread(() =>
{
while (true)
{
var val = (id++).ToString();
boo = new Foo(val);
// Sleep for a second to represent an eternity
// for the computer
Thread.Sleep(TimeSpan.FromSeconds(1));
}
});
updater.Start();
}
public void Infinite()
{
updater.Join();
}
public string Render()
{
// Many threads are here
return boo.GetCompile();
}
}
class Foo
{
private readonly string comp;
public Foo(string compile)
{
// A "complex" operation that makes
// caching this object worthwhile
comp = compile;
}
public string GetCompile() { return comp; }
}
The output of this program will vary each time it is ran, but here is a typical output:
-1: 0
-1: 0
-1: 0
-2: 0
-2: 0
[...]
-2: 1
-2: 1
-1: 0
-1: 1
Notice that in the output, thread 2 started outputting “1” and when thread 1 resumed, it initially printed “0”. Since threading is notoriously difficult to get right, let’s dive into potential problems, as maybe I just got lucky with this run, and subsequent runs will cause the program to crash.
So let’s focus on the writer. Since the writer is single threaded, the only potential issue arises when the shared variable is accessed by multiple threads. The writer is trying to assign the variable and the readers are reading from it. Would this ever result in the reader accessing corrupted data caused by a partially assigned variable? Thankfully no, from the C# standard (emphasis mine):
Reads and writes of the following are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types (pp. 163)
We now know that the complex object, Foo
owned by Bar
will never be in a
corrupted state. There is another concern; however, what if the reader threads
never see an updated value? In our example, what if the reader threads always
see “0” as the result?
Decompiling the program using Ildasm we Bar::Render
is:
IL_0000: ldarg.0
IL_0001: ldfld class threadtest.Foo threadtest.Bar::boo
IL_0006: callvirt instance string threadtest.Foo::GetCompile()
IL_000b: ret
Of particular interest are the middle two lines. The line ldfld
means that
boo is a member of class Bar and is of type Foo. The value of boo is a pointer
to Foo and move this pointer onto the stack from memory. The next line simply
dereferences the pointer with the specific function. These two lines make a
powerful statement. Since the .NET framework has a shared memory model where
each thread shares the same heap, we know that when the Foo
reference is
updated, all threads have “their” Foo
updated. “Their” is in quotation marks
because they all share the same reference because they all share the same heap,
which is where .NET references are located. The one nuance is that a thread
could read the reference, store the reference on its local stack, but before it
can act on the reference, the operating system switches threads, the writer
updates the reference, and another thread picks up the reference and prints the
new value. The problem is that the thread with the old reference – it will
print the old value because the thread with the old reference is pointing to a
different memory location then than the new threads. There is nothing
inherently wrong with the situation, the thread simply becomes out of sync for
a moment. You don’t have to worry about the old thread containing corrupted
data, as .NET ensures that any reference that is held on to by anything, won’t
be garbage collected. There’s many situations when this could be bad and a lot
of information has been written on the topic of synchronizing threads.
Hopefully, you now know that threads implicitly sync themselves, how, and why as well.
Comments
If you'd like to leave a comment, please email [email protected]