Await Async in C# and Why Synchrony is Here to Stay
Published on:Table of Contents
Await async is the hottest thing in C# (natively supported in .NET 4.5 and
supported in .NET 4.0 through the use of Microsoft Async package). It makes
asynchronous programming simple, while promising great performance
improvements. No more blocking a single threaded program waiting for a download
or dealing with BeginXXX
and EndXXX
functions. Asynchrony is a step towards
concurrency and responsiveness and so now that await
and async
are in a
.NET developer’s arsenal, it can be almost too easy to use these mechanisms
without thought to negative consequences. This is a story of how async await
isn’t always beneficial tested through profiling.
Pdoxcl2Sharp is a project of mine that is essentially a cross between a
scanner and a parser with an API similar to that of the
BinaryReader
. Since the core class uses a Stream
in
construction and parsing, I could now to use the new fandangled
ReadAsync
ability of streams. My scanner reads a character at a
time from a 64KB buffer and when the buffer is exhausted, ReadAsync
refills
it. The following code snippet shows the synchronous way of reading a character
at a time from a buffer.
/// <summary>
/// Retrieves the next char in the buffer, reading from the stream if necessary.
/// If the end of the stream was reached, the flag denoting it will be set.
/// </summary>
/// <returns>
/// The next character in the buffer or '\0' if the end of the stream was
/// reached
/// </returns>
private char ReadNext()
{
// Check if we have exhausted the current buffer
if (currentPosition == bufferSize)
{
if (!eof)
bufferSize = reader.Read(buffer, 0, BufferSize);
currentPosition = 0;
if (bufferSize == 0)
{
// Nothing left in the buffer so return a null terminator
eof = true;
return '\0';
}
}
return buffer[currentPosition++];
}
I thought I was smart. I thought while one buffer was being processed, I could
have another buffer reading in the next chunk of data from the stream, and
simply swap pointers when one buffer was exhausted. The easiest way was to not
expose this to the end user and simply wait for the next buffer if needed. In
my mind, this made perfect sense, but like everyone, I’m flawed. I would be
saying that this method is synchronous, but in reality, it isn’t. There are
some dangers with this in the general case. If there was an underlying await
statement that was being waited on and the code was executed on the UI thread
then the await will never be able to pick up where it left off because the
Wait
command is blocking the message pump. For more information, this
video with tips on working with async has a more in-depth explanation. The
following code doesn’t showcase this pathological case, as there is no await
.
Despite being safe, the performance of this method wasn’t worth it. I parsed
through several hundred megabyte files. One set of tests involved the parser
not really doing anything while the other variation had the parser complete
more complex work. In both of these sets, the version utilizing ReadAsync
showed little to no improvement over the synchronous version.
// A function that lies. It looks synchronous, but actually takes advantage of
// asychronoy under to hood -- for no benefits. Do not copy this style.
private char ReadNext()
{
// Check if we have exhausted the current buffer
if (currentPosition == bufferSize)
{
if (!eof)
{
// Wait for the next chunk of data to come in from the stream
nextBufferSize.Wait();
bufferSize = nextBufferSize.Result;
// Swap buffers
char[] temp = buffer;
buffer = nextBuffer;
nextBuffer = temp;
}
currentPosition = 0;
if (bufferSize == 0)
{
// Nothing left in the buffer so return a null terminator
eof = true;
return '\0';
}
else
{
// Launch a task to read the next chunk from the stream
nextBufferSize = reader.ReadAsync(nextBuffer, 0, BufferSize);
}
}
return buffer[currentPosition++];
}
The reason is that any time savings that ReadAsync
provided was consumed
by the overhead of tasks and asynchrony. The only possible use case for a
methodology like this would be a a very slow network based stream with an
incredibly complex parser.
Thus, the only way for me to utilize ReadAsync
would be to propogate the
async calls all the way to the client, so whenever the client wanted to read a
string
or an int
, they would have to await
the result. This meant that
async
and Task<>
populated nearly every method signature. This should
immediately set off alarms for anyone who is concerned with performance. The
code would be going being from dealing with primitives in a synchronous fashion
to references in an asynchronous fashion. In theory, there is nothing wrong
with this, but when you’re parsing a file that has millions of lines with tens
of millions of tokens having all these methods with asynchrony baked in takes
its toll. This is because asynchrony isn’t free.
// This function is truly async, but this is a terrible implementation. All the
// heap allocations caused by returning a task makes this method infeasible as
// it called millions of times.
private async Task<char> ReadNext()
{
// Check if we have exhausted the current buffer
if (currentPosition == bufferSize)
{
if (!eof)
{
// Wait for the next chunk of data to come in from the stream
bufferSize = await nextBufferSize;
// Swap buffers
char[] temp = buffer;
buffer = nextBuffer;
nextBuffer = temp;
}
currentPosition = 0;
if (bufferSize == 0)
{
// Nothing left in the buffer so return a null terminator
eof = true;
return '\0';
}
else
{
// Launch a task to read the next chunk from the stream
nextBufferSize = reader.ReadAsync(nextBuffer, 0, BufferSize);
}
}
return buffer[currentPosition++];
}
Transcribing a bit from this video, the downsides of await in terms of performance are threefold.
- State machine allocated to hold local variables
- The delegate to be executed when the task completes
- The returned Task object
For me, since the buffer is 64KB big, (64 * 1024 - 1) ReadNext
invocations do
not occur the cost of the first two allocations because there is no await
in
the code path. However, two out of three is still bad in this case, as a Task
is heap allocated whenever ReadNext
is executed and it will be executed
millions of times. Just the thought of all those allocated objects needing to
be garbage collected makes me cringe. Caching all possible tasks is a thought I
tendered.
It would almost be ok to cache all the task results if the encoding was ASCII,
but since the parser accepts any character in the windows code page, a
character can take up to two bytes (as defined by Encoding.GetMaxByteCount(1)).
The resulting cache, if it would be implemented as a continuous array using the
character as an index, would need to contain (256^2) tasks. Not to mention that
since tasks and asynchronoy would be propagated throughout the library, caches
for all the ReadXXX
methods would need to be set up as well to avoid the
third allocation. The sheer amount of memory this would consume makes it
impractical.
I mentioned it earlier, but my project is analagous to BinaryReader
and there
are no ReadXXXAsync
methods in BinaryReader
’s API. The reason for this is
explained by a Microsoft employee who commented on one of the “Parallel
Programming with .NET” team’s blog:
The reason that the BinaryReader/Writer do not have XxxAsync methods is that the methods on those types typically read/write only very few bytes from an underlying stream that has been previously opened. In practice, the data is frequently cached and the time required to fetch the data from the underlying source is typically so small that it is not worth it doing it asynchronously.
Notably, there are some methods on these types that in some circumstances may transfer larger amounts of data (e.g. ReadString). Further down the line, Async versions for those methods may or may not be added, but it is unlikely it will happen in the immediate future.
In general, you should only consider Async IO methods if the amount of data you are reading is significant (at least several hundreds or thousands of bytes), or if you are accessing a resource for the first time (e.g. a first read from a file may require to spin up the disk even if you are reading one byte).
The staunchest of async based programming should have relented by now, but they may use the argument that I should provide another set of APIs that are asynchronous based to complement the synchronous, so that the user can decide what one to use. I did briefly consider this option, but decided against it because this would have presented the user with too many options – causing indecision and anxiety. Only supporting a synchronous workflow makes my life and the client’s life easier.
Solution
The solution is to embrace synchrony. While a library should make asynchrony
available, it should not do so at great cost. The solution is to relegate
asynchrony back onto the client. For instance, have the client await a download
to a MemoryStream
and pass that to the parser. This way, the parser becomes
CPU bound and not IO bound, which makes it great for parallelism.
Strangely enough profiling revealed that setting up a pipeline using TPL
dataflow with asynchronously reading the file into a memory stream was
slower flat out using all cores in a Parallel.ForEach
. Below is the code that
proved to be the fastest at reading and parsing a directory filled with tens of
thousands of small files. For the duration of the program, disk access was at
100%, so whatever it is doing under the hood, it is doing it right. The more
time I spend writing this post, the more I think ReadAsync
is useless for
files. Networking is where I think it would be useful.
var provs = new ConcurrentBag<Province>();
Parallel.ForEach(Directory.EnumerateFiles(dir, "*",
SearchOption.AllDirectories), file =>
{
using (var fs = new FileStream(file, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite, MaxByteBuffer))
{
provs.Add(ParadoxParser.Parse(fs, new Province()));
}
});
The one potential problem with this this parallel loop believes it has infinite
resources. By setting the MaxDegreeOfParallelism
, we set the maximum number
of tasks that can be executed concurrently. So, if we know the parser uses an
internal buffer of 64KB and MaxDegreeOfParallelism
is 20, we are guaranteed
that the total operation doesn’t consume more than (64 * 1024 * 20 = 1.25MB).
In conclusion, there is a very real and very tangible tradeoff between
async/sync and cpu/ram. Async may be what’s hot on the block right now but it
is not always the right decision. If ever you doubt this statement, remember
that BinaryReader
doesn’t have ReadXXXAsync
Comments
If you'd like to leave a comment, please email [email protected]