I'm publically announcing my move off of the signal chat protocol controled by Whisper Systems. I'll still be on there occasionally for now, but I'm almost always available on matrix @mbrewer:matrix.org.
I've been considering various options. I'm unwilling to consider using anything that doesn't support end to end encryption, or anything with more central control than Signal. But, these days there are actually options in this space! Signal is not the most FOSS/libre player anymore. And their insistance on only supporting the Android/iPhone ecosystems is a major downer. The other two options I found are Jami and Matrix. Here's a "feature" matrix for matrix, signal, and Jami.
|System||Functional||e2ee||Open source||FOSS/libre||centralized||clients available||Required OS|
|Signal||yes||yes||mostly||no||yes||Whisper System's provided||Android or iPhone|
|Matrix||yes||yes||yes||yes||federated||Dozens available||Android, iPhone, linux, OSX, Windows, Arm and x86, can easilly port to other OS/Archs|
|Jami||no||yes||yes||yes||fully decentralized||one FOSS client||Android, iPhone, Linux, OSX, Windows|
For me Matrix seems like the clear winner. Jami *sounds* good but when I tried it I couldn't actually work. So Matrix it is. More details below.
Problems with Signal
Whisper Systems has talked about a first-class desktop client for years, but nothing of the sort has materialized. They have a desktop client, but it can only be used secondarily to having a phone by mere mortals. A few people have managed to run it as a primary client, but you have to do a lot of hacking around the system design. As a result to really use Signal you must own an iPhone or an Android phone with a valid phone number and be willing to register that with Signal. For me this is a practical matter. I don't own an Android phone and don't intend to any time soon. Many folks have also pointed out that in many countries you can't get a phone without basically registering your identity with the government as attached to that phone. That raises privacy related safety issues for those dissenting from their government.
While Signal protocol is public, and some code is open source, it is neither FOSS/libre nor an open system. Whisper Systems has singular control of the server side of the system. My understanding at least is that the server code they publish is out of date. But, most importantly, they change the protocol whenever they feel like it. This means that in practical terms only they can provide consistantly functional clients. This has been demonstrated by attempts like axlotl to fill the application gap for first-class desktop IM clients. axlotl does work for some people some of the time, but it's flaky at best. In reading about axlotl it appears this is due to the constantly moving target of the protocol, as represented by the server-side software run by Whisper Systems. When I tried axlotl I got stuck in a perpetual auth loop.
Let me be clear. I laud Whisper Systems and the developers there for their efforts. They have changed the entire conversation around e2e encryption and chat protocols, making e2ee available to normal users, and doing it well. For a centralized chat ecosystem they are doing a better job than anyone else out there, and I'm NOT condemning that effort. They take security seriously, they open source the important parts of their software so anyone can audit it. From a pure security perspective they are doing everything right, and that is not only rare but basically unheard of in the software industry.
What I am saying is that I demand more. There is no reason we should be locked into a single provider for our chat services. While they provide a valuable service, they do so on their terms with heavy centralized control of the overall ecosystem. Sure you can run a 3'rd party client, but they don't announce ahead of time that server support is changing, so while they *allow* it, they don't *support* it. I'm not a FOSS/libre purist actually. I just want to be able to use my software in certain ways, and control my own systems, and this is just one more case where those desires are in practice incompatible with more closed software ecosystems.
Problems with Jami
I attempted to use Jami for some time. Jami sounds ideal in that it basically drops the server-side entirely, using only a very thin server for aiding in NAT traversal. The problem? It doesn't work (yet at least). While I did manage to chat with my wife occasionally with this software it was so flaky as to be useless. I couldn't predict when it would or wouldn't work. I had to restart clients frequently. Whether this is the protocol or the client I'm not positive, but whatever the reason it's not currently a usable option. After a couple of months of experimentation I gave up. Besides these problems Jami is harder to use for normal users. For example: with no server there is no such thing as account recovery. The model is confusing, and the chance of lockout for normal users is high.
Is matrix really better than these?
While I think Jami's vision is absolutely the ideal. In practice Matrix.org seems like a better balance between usability and encryption, striking a similar balance to signal, and most importantly, matrix actually works. Matrix is a fully open federated protocol that also supports e2ee and has dozens of fully FOSS/libre clients, and multiple FOSS/libre server-side implementations. Matrix.org provides an easy default server for the normal user much like Signal, but unlike Signal if someone is dissatisfied with that they can go off and run their own server and still chat with folks on other servers. And because the protocol spec is a well defined standard there are many clients available for basically anything, many of which work consistantly. And if you have an extremely weird use-case (you're running OpenBSD on a vax) you can always port a client yourself.
I've run into one significant flaw in Matrix. E2ee support in the client is *optional*. This is sensible as Matrix can also be used as an IRC replacement (open chatrooms), where encryption is likely not that valuable. As a result though many clients don't support E2ee yet. I recommend "fluffychat" as I've had the most luck with this client. My wife runs it on Android. I run it on an ARM64 linux phone and an x86_64 linux tablet. There was an issue with it getting stuck in right-to-left language mode on linux caused by an underlying library but it's been resolved. Element would be my next recommendation. It worked well for a while then broke on me last week. To be fair though I am using debian unstable so it may not be the application's fault.
As someone who has opted out of the iPhone/Android duopoly, Signal is basically not an option anymore. So I'm using Matrix as my primary chat platform. I hope I can drag a few other folks along with me.
I got a librem5 about a year ago now. It was quite useless until recently when I realized that to get newer software I had to flash it myself once, THEN I would get updates. I've been playing with it since, and it's somewhat usable. I wouldn't say it's ready for primetime, but while the stability is so-so, I can actually do useful things with it now. Given that I wanted to document what I've done to set it up.
first, I reflashed with byzantium. Then during setup I connected it to my Nextcloud instance I'm already running... easy enough. As a tip I hate running sudo over and over, "sudo su" doesn't work on pureOS but "sudo /bin/bash" does.
Now for software
- Maps: The most interesting piece of software (in my opinion) for the Librem5 at this time is offline maps via PureMaps. This can be installed via flatpak "flatpak install puremaps". This gets the map app itself. Next you want the scoutserver as well "flatpak install scoutserver" installs the scoutserver. Now via the GUI open up scoutserver and download the maps you want. Then start puremaps and set it to "offline" mode. And now you have offline maps! Make sure the GPS is enabled, it'll take a while to get a lock the first time, but after that it seems to work okay.
- TOTP: I use TOTP as a second factor for a lot of authentication needs. For this I installed "gnome-authenticator". The configuration is in a DB not a text file, so it's a bit hard if you want to synchronize with other devices, but other than that it works fine.
- Email: The default email client "geary" works fine for me for email
- Calendar: The default calendar client works fine for accessing my nextcloud calendar
- Weather: The default weather app doesn't seem to support arbitrary locations, which is a bit annoying. The closest location I could configure it for is about 30 minutes from here. Not ideal, but better than nothing.
- PDF Reader: Evince works well for viewing PDF documents.
- Contacts: automatically syncs with nextcloud once you configure the account.
- Todos: gnome-todo, installed via commandline apt, is a bit funky on a small screen, but usable at 150% scale (as configured via settings app). It picked up the nextcloud account on it's own and syncs automatically. My wife and I use the nextcloud-synced todos for shopping lists and the like.
- Timers/Alarms: The default "Clocks" has a usable timer and alarm functionality
- Spreadsheets: I use spreadsheets for tracking my budget and such, so I installed "gnumeric". It's not perfect, but usable at 150% scale setting (as configured via the settings app). You can install this via command-line apt.
- RSS reader: gnome-feeds is perfectly usable.
- Ereader: I'm using "foliate" installed via flatpak. It seems to work well. I mostly read non-DRMDed azw3 files.
- Password Database: I use keepassxc on my computers, so I neeeded something compatible. I store my database in my personal Nextcloud, to synchronize it I installed "nextcloud-desktop" via commandline apt. Then I installed "password safe" available in the PureOS store. I use a private key to lock the database as well as a password, so to copy that I installed "ssh-client" via ssh. Note that you can also enable ssh the other direction via settings->sharing->remote login (I suggest you disable this again once you have things setup how you like).
- Browser: The default one is fine usually, but I've found it useful on android in the past to have a more standard fully featured browser available in case I need it, so I installed "firefox-esr" via commandline apt.
- Text editing: Because I'm a linux dweeb I also installed vim and gvim. I'm still deciding if gvim is useful. I have an 80% sized bluetooth keyboard so when messing with the phone it's nice to have a real vim available. Note BTW, the on-screen keyboard is better than one might expect. Under the "world" key you'll find the ability to switch to a "terminal" keboard. Then on the other side of the spacebar the ">_" key actually brings up arrow keys and most importantly a Tab key! Still, a real keyboard is nice to have.
- VPN: Right now my home network is on HughesNet (ouch). As a result the only way to reach it is via IPv6 (even that was surprisingly hard to get working). Anyway, this means that when I'm on an IPv4 only network I need an IPv6 compliant VPN to be able to reach my home server to get email, sync todos, etc. To do this install "network-manager-openvpn-gnome" via commandline apt. Now go to the "advanced networking" app, this is secretly just network-manager configuration. Now add a VPN and use the import feature.
- Camera: byzantium comes with a "Camera: app. It doesn't autotune, so it's a bit of work to use, but it is *possible* to take photos, which is a huge step forward.
- Signal messenger: I tried "axolotl" installed via flatpak but so far have been unable to get past the recaptcha. This seems to be a common problem so I'm hoping it gets fixed soon. For background Axolotl is a 3'rd party app (not from the Signal developers) that, unlike signal-desktop, is fully featured and can act as a "primary" device. But since it's not from the signal developers changes in the signal protocol can cause it problems. The signal devs seem to be *slowly* coming around to the idea that maybe not everyone using Signal has to have an Android or an iPhone. We can cross our fingers. If more folks are using Linux Phones (be it Librem, PinePhone or Ubuntu) that'll help.
Lastly I removed all the apps I don't actually use (e.g. text editor). Unfortunately I didn't record the names of all of the apps, but you can always use "apt list --installed" and grep to work it out.
Finding software: I believe the librem5 is basically mobian/phosh. If you take a dpkg for mobian/phosh and try and install it it won't just work (I tried it), because the library versions aren't identical, but if you're searching for software that will work well on the librem5, software that works on mobian/phosh probably will, and is most likely already available in the pureOS repos if you use the commandline apt. The PureOS store is incredebly limited and I found it largely useless, I'm sure it'll become more useful in the future. A surprising amount of software is available by flatpak as well, while I'm not a fan of it
Overall I think PureOS is getting there in terms of functionality and I'm excited to really use my device now. Battery life is just about long enough to last a work day. What it needs most now is stability. For example, the bluetooth functionality is pretty sketchy. I'm unsurprised by this as the gnome/bluetooth stuff is no *more* functional on laptops last time I tried it. I gave up and use a commandline tool for bluetooth on my Microsoft Surface Go linux tablet (my other, and my primary, machine). Unfortunately this has been common for Gnome. Many of my friends still call "NetworkManager" "NetworkMangler" due it's long history of stability problems. I love the idea of linux phones, but the gnome development philosophy has, sadly, historically failed to deliver reliable and functional tooling. It's quite possible that librem and pinephone and others will actually help to fix those longstanding problems. I'm crossing my fingers
The best thing about the Librem though is that if they fail that's not the end of the story. Folks over in the pinephone world are playing with several other even more linuxy/gnu'ish phone GUIs. Running that stuff on the 100% open and free software librem is comparitively easy. I'm hoping purism can really deliver themselves, but it's nice to have a backup plan :). This is a HUGE difference from the days of Nokia and maemo. Although it was open then, the ecosystem wasn't there, but it's finally here.
One last note... my device has a problem with the SD card at least, and likely a problem with the SIM slot as well. I identified this problem the day the device arrived, but Purism dropped the ball on customer support and never replied after their first response. I've emailed them back now and hoping they'll get me a fully functioning device. I gather online that this isn't a terribly unusual story. They are trying hard but seem to be perpetually overloaded. In a way it's a good thing, as it means people want what they are selling, but it means the customer service experience might not be the best.
I got my phone back from purism with a working microSD slot... WOOT! I have a 1TB card arriving tomorrow so I can sync everything from nextcloud and have more than enough spare space for storing maps of the U.S. and Canada for puremaps, a copy of wikipedia, and whatever else I feel like.
- Phone/SIP: It turns out the default phone app has entirely functional sip support.
- Chat: Due to Whisper System not getting to building a fully featured desktop client (they've been promising for years now) I'm giving up and moving over to matrix. I found "element" works well on my tablet and there's a flatpak port of "fluffychat" (the android matrix app.) that works great on the librem. These both have full e2e encryption support. Eventually I hope to set up my own matrix server with bridging to signal, but that's going to take some effort. I thought chatty (the default chat app) supported it but apparently matrix support hasn't been added yet.
- Conversions: I find having a unit conversion app on my phone really helpful, so I installed "convertall" which is an absolutely amazing linux app that's been around forever that lets you do arbitrary unit conversions, including some rather crazy things. gallons per fortnight to cubic light-nano-seconds per day is no problem for it.
- Encyclopedia: kiwix, because why wouldn't you want all of wikipedia with you at all times?
- VLC: so I can watch movies, the UI is actually quite usable
I need to fix up some issues with my nextcloud server (I just moved from Hughesnet to Starlink and it's way better, but I lost my whacky ipv6 hacks), but once that's properly online again most of my use-cases are met by this suite of apps. Some could use some cleanup but purism is getting there.
Although I like my device. I wouldn't encourage others to get it. The price-tag is just too high for a device with poor battery life with still unreliable software. I *really* like the idea of running Linux on a phone, but I think the PinePhone project is a more economical way to get there. Sure the performance is a bit lower, but the devices are cheaper and the ecosystem is far better. In that end that's going to make for a more usable device. Ecosystems matter, that's why Linux won and BSD is now a barely used OS.
You'll recall in my recent post about Fast Sort we learned that Radix Sort is significantly faster at sorting 32 bit integers than normal comparison based sorts, particularly as the array gets large. A result every computer scientist knows theoretically, yet with median-find we saw how theoretical bounds don't always translate in practice.
Well, a friend asked me, what about sorting variable length strings? Does radix sort still beat comparison based sorts? It's a fair question so, I decided to take a shot at it. And here are my results:
In my mind that graph is pretty clear, above 10 elements radixsort is clearly winning. Note that this is currently a 128 bit radix-sort that only handles ASCII... though I'm actually only feeding it uppercase strings currently. So, lets talk about how this algorithm works, because it's not an entirely trivial conversion of radixsort
String Radix Sort Algorithm
This is a little bit interesting. You may recall that there are two types of radix-sort. Least Significant Digit first, and Most Signicant Digit first. These are referred to as LSD and MSD. My binary radix sort from earlier benchmarks was an example of an MSD sort, and the one I just referred to as "radix sort" is an LSD sort. LSD sorts are preferred generally because they are stable, simplier to implement, require less temp space AND are generally more performant.
There's just one problem. With variable length strings, LSD sorts don't work very well. We'd have to spend a lot of time scanning over the array just looking for the longest array so we can compute what counts as the smallest significant bit. Remember that in lexicographic ordering it slike all the strings are left justified. The left-most charactor in each string is equivelent in precidence, not the rightmost.
MSD sorts, must be recursive in nature. That is, they need to work on only the sublist we sorted in to a certain bucket so far. I'm going to call this sublist a "slice". To keep our temporary space vaguely in order I'm using a total of 5 lists.
- The input list of strings (call this string list A)
- Temporary list of strings (call this string list B) (length of A)
- Temporary list of indices into string list A (call this slice list A) (length of A)
- Temporary list of indices into string list B (call this slice list B) (length of A)
- An list of 129 buckets
Here's the algorithm. Start by looking at the first bytes of the strings. Look in slice list A, and get the next slice. Bucket everything in this slice. Each of these buckets (if non-empty) becomes a new slice, so write strings back out to string list B, and write the index of end each slice in to string list B. Swap lists A and B, move to the next byte, and do it again. We terminate when for each slice it's either of length 1, or we run out of bytes. To see the full implementation take a look at string_sort.h in my github repo .
Conveniently, they way my algorithm works it is in fact stable. We walk the items in order, bin them in order, then put them in the new list still in order. If they are equal there is no point where they'd get swapped.
It's a LOT of temporary space, which is kind of ugly, but it's pretty performant as you saw above. Another optomization I haven't tried is short-circuiting slices of length 1. We should be able to trivially copy these over and skip all the hashing work. Testing would be required to see if the extra conditional was worth it... but It seems likely
Data tested on
To test this I'm generating random strings. It's a simple algorithm where, with a probability of 9/10 I add another random uppercase letter, but always stopping at 50 charactors. I'm mentioning this because obviously the distribution of the data could impact the overall performance of a given algorithm. Note that this means functionally we're only actually using 26 of our 128 buckets. On the other hand, real strings are usually NOT evenly distributed, since languages carry heavy biases towards certain letters. This means my test is not exactly represenative, but I haven't given it a clear advantage either.
I can't say that this is a clear win for Radix Sort for sorting mixed-length strings. The temporary space issue can be non-trivial, and certainly time isn't always worth trading for space. We're using O(3N) additional space for this sort. That said, there are some obvious ways to reduce the space overhead if you need to, e.g. radix-sort smaller chunks of the array, then merge them. Use 32 bit instead of 64 bit pointers, or come up with a cuter radix-sort.
Note that my radix-sort was a mornings work to figure out the algorithm, write and validate an implementation, find a couple optomizations, and benchmark it. I wrote this post later. Oddly "inline" made a huge difference to gcc's runtime (it's needed due to loop unrolling for handling the A/B list cases). In any case, I've little down someone can beat my implementation, and maybe find something using a bit less space. I just wanted to prove it was viable, and more than competitive with comparison based sorts.
Similar to Radix Sort, I thought it might be interesting to see how worst-case linear time medianfind actually performed. Since the algorithm is identical to expected-case linear-time medianfind (usually called quickselect), except for the pivot selection, I elected to add a boolean to the template to switch between them (since it's in the template it'll get compiled out anyway). Before we go in to the results, here's a quick refresher on these algorithms:
Imagine you have a sorted array. If you want the K'th largest element, you can simply look at the element in location K in the array. Comparison-based sorting an array takes O(Nlog(N)) time (strictly speaking the theoretical limit is log-star, but it doesn't really matter). What if we want to do this without sorting the array first?
Quick select chooses a pivot and runs through the array throwing elements in to 2 buckets... smaller and larger thanthe pivot. Then it looks the number of elements in the buckets to tell which one contains the k'th element, and recurses. We can prove we usually choose a good pivot and this is expected O(N) time. But it's worst-case is much worse.
Worst-case-linear Quick Select
What if we always chose the right pivot? Or at least... a good *enough* pivot. This is how we build our worst-case-linear quick select algorithm. It's a really cool trick, but it's been covered in many places. So if you want to know how it works you can check wikipedia, or this nice explanation .
Real World performance
All of that is great in theory, but what is the *actual* performance of these things... well, in a benchmark, but at least on a real computer.
As usual I'm working on an array of test_size k/test_size times, so we work over the same number of array-cells at every point on the graph: small arrays many-times on the left, and large arrays fewer-times on the right.
For a while I was pretty upset about these results. The runtimes for "lineartime" quickselect look more like quicksort (the algorithm displayed as the "sort" line) then they do like basic quickselect. In short... that doesn't look linear at all. What the heck?
I must have a bug in my code right? This algorithm was proved linear by people much smarter than me. So, my first step was to double-check my code and make sure it was working (it wasn't, but the graph above is from after I fixed it). I double, triple, and quadrouple checked it. I wrote an extra unittest for the helper function that finds the pivot, to make sure it was returning a good pivot. Still, as you see above, the graph looked wrong.
I finally mentioned this to some friends and showed them the graph. Someone suggested I count the "operations" to see if they looked linear. I implemented my medianfind algorithm using a seperate index array. That way I could output the index of the k'th element in the *original* array. From there everything is done "in place" in that one index array. As a result, swapping two elements is my most common operation. That seemed like a pretty accurate represention of "operations". So, here's what that graph looks like.
Now THAT look's a bit more linear! It's not exactly a flat line, but it looks asymptotic to a flat line, and thus classified as O(N). Cool... So, why doesn't the first graph look like this?
Real machines are weird. That index array I'm storing is HUGE. In fact, it's twice the size of the original array, because the original is uint32_t's and my index array is size_t's for correctness on really large datasets. The first bump is similar in both graphs, but then a little farther to the right in the time graph we see it go crazy... that is probably the algorithm starting to thrash the cache. Presumably if I made it big enough we'd see it go linear again. That said, if we go that large we're soon running on a NUMA machine, with even more layers of slowness, or hitting swap.
So, should you *ever* use guaranteed linear-time medianfind? Probably not. If there is a case it's vanishingly rare. It happens pivot-selection distributes well, so there's probably a use there? But, if you just used the "fastsort" we talked about in my last post you'd get even better performance, and it's still linear, AND it distributes well too! It's not comparison-based of course, but there are very few things that can't be radixed usefully with enough of them, and if you're stubborn enough about it.
Big O notation didn't end up proving all that useful to us in this case did it? Big O is usually a good indicator of what algorithm will be faster when run on large amounts of data. The problem is, what is large? Computers are finite, and sometimes "large" is so large that computers aren't that big, or their behavior changes at those sizes.
In my last post I was benchmarking various sorts against each other to test a couple of radix algorithms. https://blog.computersarehard.net/2018/12/sort-benchmarks-and-modern-processors.html . Radix Sort came out the clear winner for large data sizes (as we would expect), but that experiment left some questions.
In particular, on small arrays (less than ~100 elements in size), Radix Sort performed quite poorly. I posited a couple of theories. Maybe it's prefetching and branch prediction, or maybe it's all of the passes over the radix-array itself that costs more (as a friend of mine suggested). If the former is the main factor than we should see little performance difference as the radix-size changes. If the latter is the main factor then as the radix-size goes up our runtimes on these small arrays should shoot through the roof.
As you'll recall from my last post, every point on this graph involves sorting the same number of elements. The X axis indicates the length N of the array sorted, but it is then sorted K times where N*K = a constnant. This way this graph shows nonlinearities in the algorithms as they change with N. The subscript for each algorithm is the number of *bits* in the radix, so the actual radix-size is 2^K (e.g. 256 for the yellow line). Note that the constant is not the same as my last post, so please don't cross-compare the graphs.
This graph gives us a pretty clear answer. Our slow performance on small arrays is definitely the radix array. With a radix of 256 it doesn't start to look linear until around 1000 elements, and doesn't beat a radix of 2 until ~20 elements.
You're probably wondering about the weirdness out on the right. Testing on any larger arrays gets somewhat time-consuming quickly. Logarithmic graphs are annoying that way. That said, I generated several graphs like this one, and all had similar artifacts towards the right, especially for larger radix sizes. But it looks pretty noisy overall. My current guess is caching behavior. The machine I'm testing on is far from sanitary, lots of other stuff is going on in the background. It could be that as radix sort uses nearly the entire cache it becomes highly sensitive to other workloads. A larger radix might have more opportunity for chunks of the radix array to get kicked out repeatedly and thus be more sensitive.
We saw in my last post that radix sort is extremely valuable for larger data-sizes, in fact, it beat out eeverything else by around 40 elements. On the other hand, we also saw that it has pretty bad behavior on really small arrays. Those results were found with a radix size of 32 (or 5 bits, which would land between the blue and green lines on todays graph).
From this new experiment we see that there is a significant cost to larger radix sizes. For most uses on modern computers 256 integers is a non-issue memory-wise. But, with a large radix our performance is abysmal on really small arrays. A standard solution to this type of problem is to blend algorithms (this is frequently done with quicksort, using selection sort for extremely small arrays). Heapsort and mergesort both are stable sorts as well so they seem like good candidates. Lets see how they compare.
The graph is a bit ugly because I didn't do a ton of points, but the results are still pretty clear. If we just take the best algorithm at every point it looks like we should use heapsort <20, then radix sort with 4 bits up to 30, then radix sort with 6 bits up to 60, and then radix sort with 8 bits. No doubt if I tested larger radix sizes this trend would continue.
Of course, if our conditional choice gets to complex it's cost becomes a problem as well. To make radix sort fast it's compiled with the radix size as a template parameter, so each switch is going to tresh code cash, etc. If you're getting THAT picky you'll be benchmarking for your specific use-case anyway. So for general use I would propose that we keep it simple. Heapsort up to about 30, and then radix sort with 6 bits above that is a nice simple algorithm that gets us pretty close to our ideal speeds. It certainly wipes the floor with any other single algorithm. We'll call our new mixed algorithm fast sort.
It's a little hard to see the green line that is fastsort, but that's because it performs exactly how we would hope, like heapsort up until 30, and then like radix sort with a 6 bit radix.
Discussions of sort speeds are usually theoretical (how many comparisons are being done). In reality, as we showed with AVL trees being just as fast or faster than Red Black trees, moves and cache behavior play at least as big of a role as comparison count. Most of the literature I've seen says heapsort is not a good choice for speed alone, while here we've found it to be a very good option for smaller arrays. Our final fast sort algorithm differs from anything you are likely to find in a textbook, yet is an extremely performant, and by happinstance stable, sorting algorithm.
There are of course caveats to these conclusions. As noted in earlier posts, these are my implementations, it's always possible (even probable) that I'm farther from the ideal implementation for one algorithm than another. Additionally our fast sort has one rather annoying property drastically limiting it's utility. Our radix algorithm has to work on the type being sorted. At the moment it's written for unsigned types, but it would have to be changed for signed types, strings, integral types larger than size_t, etc. With comparison-based sorts you can sort largest-to-smallest simply by swaping the comparitor. With radix sort (and thus fast sort) significantly less trivial adjustements have to be made to the sort itself if you don't want to have to reverse it in a seperate pass. In practice, these limitations are probably why radix sort gets so little attention despite it's obvious strengths. The details can't be abstracted away as cleanly as comparison based sorts.