Thanks!
This post was heavily inspired by infographics by Bartosz Adamczewski @badamczewski01 and wouldn’t be possible without sharplab.io by Andrey Shchekin @ashmind and BenchmarkDotNet by contributors.
Context: TimSort for .NET
So I wanted to (re)implemented TimSort for .NET. My previous attempt was relatively successful (well, it worked) but it could be better. Closure of CodePlex (that’s where it was hosted before) gave me perfect opportunity - I need to move it, but I can also rewrite it.
One of the problems with original implementation was the fact that I wanted to have all types of array-like objects (Array
, List
, and IList
). The problem is the choice was (at least that’s what I thought at the time) implementing it 3 times, or implementing it for IList
only and take the performance hit (as accessing Array
though IList
interface comes at the price).
In the end, it was implemented 3 times and do avoid maintaining 3 code bases the actual algorithm was in T4
template (.tt
file) so 3 implementations were generated from it.
Time has shown that I knew very little then…
Insertion sort
This post is not about TimSort though (as at the time of writing it is not finished yet). This post is about injecting behaviour into algorithms while retaining original performance.
So, insertion sort is simple O(n^2) sorting algorithm, which actually has some real-life usage. It is actually irrelevant, which one I use for this post, but if I need to use any let’s use the one which is actually has some use cases: it is very fast (O(n)) for data which is nearly sorted and is used as a part of IntroSort combo (quicksort + heapsort + insertion sort). For example, IntroSort is used by .NET.
Let’s start:
1 | public static void Sort(int[] array, int start, int length) |
…and measure baseline implementation:
1 | BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update) |
Will it work for any type?
That’s the problem. It was been tailored for array of ints (int[]
). There are two issues to address: accessing generic indexable data (let’s say IList<...>
) and comparing items (let’s say IComparer<...>
).
That’s super simple! Let’s do it!
1 | public static void Sort<T>( |
That wasn’t hard, was it? Fantastic, problem solved new implementation works for int[]
as well but is completely generic. PROFIT! Let’s just measure performance and go home…
1 | BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update) |
…or not. It is 8 times slower now.
I heard it is about virtual calls to interfaces?
Yes, it is. Simple comparison (most likely one assembler instruction) became a call to virtual method which is (relatively) very slow. What can we do though? Well, not pass interfaces but pass classes, yes? Well, it still would need to be virtual call to make behaviour polymorphic.
How to make a static call to something with polymorphic behavior? This sounds stu… Generics!
1 | public static void Sort<T, TIndexer, TComparer>( |
Now all calls are done to actual class not to interface so when we call:
1 | Sort<int, int[], Comparer<int>>(_data, 0, _data.Length, Comparer<int>.Default); |
it hopefully will get expanded to call actual int[]
not IList<int>
… Let’s measure it:
1 | BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update) |
Oh my, it is not working! It is actually completely identical and it still calls interfaces.
Does it mean that I really need to implement it multiple times, for Array
, List
, IList
and… Span
? Is there any way to implement it just once but use in different contexts?
Value types for the win!
Wait a seconds… When .NET expands generic class it does it for each value type separately, but not for reference types. That’s why even if int[]
was passed as generic argument it was not generating “dedicated” implementation for int[]
it was still using IList<int>
interface. Let’s try to force it by using value types.
NOTE: I don’t want to use implement whole IList<T>
in my struct so let’s simplify it a little bit as the only thing we actually use from IList<T>
is indexed access:
1 | public interface IIndexer<T> { T this[int index] { get; set; } } |
so now we can implement struct which implements this interface for array:
1 | public readonly struct ArrayIndexer<T>: IIndexer<T> |
We can (re)implement insertion sort using this interface:
1 | public static void Sort<T, TIndexer, TComparer>( |
(NOTE: nothing changed in implementation, just method signature is slightly different)
Seems that BenchmarkDotNet agrees with me as we definitely have improvement:
1 | BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update) |
Let’s do it for comparer as well. The only thing which we actually want from Comparer
in this algorithm is <=
(less or equal), so, let’s make a dedicated interface and dedicated implementation for int
(because that’s what we are measuring):
1 | public interface ILessOrEqual<T> { bool LeEq(in T a, in T b); } |
Now we can modify implementation of algorithm to take this ILessOrEqual<T>
interface:
1 | public static void Sort<T, TIndexer, TComparer>( |
So that would be it, let’s measure:
1 | BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update) |
Yay! We are back to original performance, but with generic algorithm!
Definitely calling this method is little bit cumbersome right now:
1 | Sort<int, ArrayIndexer<int>, IntComparer>( |
but this can be hidden behind some nice overloads and you will have few of them (for T[]
, List<T>
, IList<T>
, or even IDictionary<int, T>
- why not). The story behind Span<T>
is slightly different. Possible, but a little bit more complicated (pinning is required).
Conclusion
The point is it can be almost as fast as dedicated version but also as generic as fully generic version.
Please note, it will be expanded for each indexer/comparer combination so your code will take more memory when JITted, but I don’t think this might be a problem.
I understand NetFabric.Hyperlinq uses same approach with Value delegates to make LINQ a little bit more performant without losing flexibility.