blog

Benchmark of all major dictionary structures

2017-04-20

I've been writing basically every major datastructure, one at a time.
I wrote up heaps a little while ago: http://www.blog.computersarehard.net/2017/02/a-better-heap.html
I've now finished writing and benchmarking all the common dictionary datastructures.
Note that at every point in this graph the same amount of work is being done. At each point we put "test_size" random elements in to the datastructure, and then remove them. We do this 134217728/test_size times, and time the *total*. Thus we're always putting in and taking out 134217728 elements.

As a result, this graph is showing is how the size of a datastructure impacts it's performance. Note that the graph is logarithmic on the X axis, so it's not completely dominated by the larger tests.

First, lets talk about what each of these algorithms *is*. As a note all of these algorithms resize automatically, both up and down.

AVL: This is a relatively standard AVL tree with external allocation. I was unable to find an implementation that was correct and balanced, so I rederived some details myself.

O(log(n) all operations

RedBlack: This is a highly optimized red-black tree implementation (much more time was spent on it than AVL, due to AVL outperforming it in my testing perviously, I assumed I must have made a mistake).

O(log(n)) all operations

BTree: This is just a standard btree implementation, tuned with a fairly efficient arity

O(log(n)) all operations

Hashtable: This is a standard open chaining hashtable, using dynamically sized (doubling) arrays for the chaining

O(N) worst case, O(log(N)) amortized

OCHashTable: This is a standard open chaining hashtable, using an externally allocated linked-list for the chaining

O(N) worst case, O(log(N)) amortized

AVLHashTable: This is a standard open chaining hashtable, using an externally allocated AVL tree for open chaining

O(N) worst case, O(log(N)) amortized

BoundedHashTable: This is a giant pile of tricks. It uses an array like datastructure that can be zeroed in constant time. An AVL tree for open chaining. Rehashing on resize is distributed across operations so it rehashes a couple of elements for every op, using a link-list to find each element.

O(Log(N)) all operations

Algorithms left out

BtreeHashTable: This performed so poorly I pulled it out of the testing. The reason is obvious, while btree is very fast, it's not good for small datasets, and the chains in a hashtable are always small

Surprising results:

You may notice that NONE of these algorithms are even *close* to linear in practice. If every operation is amortized to constant time, as in our hashtable algorithms, the line should be completely *flat*. No more work should be done, just because the datastructure contains more data. This is even more true for the bounded-hashtable, where no operation is *ever* linear, the only reason it's log and not constant even on a per-operation basis is the AVL tree used for chaining.

I spent a while trying to find non-linearities in my testing methodology but came up with nothing. Remember, the X-axis is logarithmic Isn't that odd? If that's throwing you off, here's what it looks like graphed linearly (My data is logarithmic in nature, so the graph is pretty ugly). Whatever that is... it's not linear.

So, what's going on? My best guess is this is the computer's fault. Caching layers, memory management, etc. memmap() probably takes longer and longer to get us memory for malloc for example. I've yet to get detailed enough information to confirm this theory though.

Conclusion
Well... aside from the nonlinearity described above. OCHashtable is the clear overall winner for average runtime at any scale, no big surprise there. BTree is the clear winner for bounded algorithms of large size. AVL and RedBlack are about equivelent for small size... but given in my previous testing AVL came out a little faster, lookups should theoretically be a little faster, the implementation tested here is less optimized than red-black, and an order of magnitude simpler to code, AVL clearly beats RedBlack (as is now known generally).

This is pretty much what we would expect. I had high-hopes for BoundedHashTable, as *theoretically* it is very fast... but the constant factors seem to blow it out of the water, and it still shows up as very much non-linear. This algorithm is unable to resize arrays (as realloc zeros, which is linear), this means constantly allocating new differently sized arrays. I suspect this along with the constant factors due to algorithmic complexity is probably the cause of poor performance.

As always, full source is available here: https://github.com/multilinear/datastructures_C--11

Sort tests

2017-02-14

Comparing sorting algorithms isn't terribly original, but I didn't think comparing AVL trees to RB trees was either, so I thought I'd do it anyway and see what the real results were.

Quick refresher. Most computer scientists know big O notation, but we tend to forget big Omega and big Theta. Big O is the upper bound, big Omega is the analogous lower bound and big Theta is used when the two are the same. Got that? I often hear people state a "best case" as "big O of", but I want to promulgate correct usage.

In testing, quicksort was of course the fastest, mergesort next and heapsort last. Though I wrote selection and bubble as they have their own uses, I didn't even consider \Omega(N^2) algorithms for speed testing. Just as a reminder before we look at the results here's the boundaries and properties for each:

Quick Sort

O(N^2)
\Omega(Nlog(N))
In place, not stable

Merge Sort

\Theta(Nlog(N))
Not in place, stable

Heap Sort

\Theta(Nlog(N))
In place, not stable

But, lets talk some real numbers. I did two tests, one with 10000000 element sort run 100 times, and one with 100 element sort run 10000000 times. I'll call the first "large" sorts, and the second "small". I did a little more testing to confirm that the results are relatively stable within those intuitive categories.

Quick Sort:

Fastest in both cases

Merge Sort

Large test: 16% slower
Small test: 9% slower

Heap sort

Large test: 122% slower
Small test: 19% slower

Okay, so these were basically the results we all expected right? There are a couple of interesting details though.

First, because it just jumps out at you, what the heck is with heapsort? It certainly does more operations than the other two, but that wouldn't account for the difference between small and large. My guess is that as the heap spreads out basically every lookup in the array is a cache-miss, this is what bheap was attempting to improve for a normal heap algorithm, but the constant factors came out even worse.

Now, lets talk about the two algorithms who's speed don't immediatly knock them out of the running. Their are two commonly cited reasons for using Quick Sort over Merge Sort. The first is that it's in place... I did some further testing and on my machine (a modern linux distro), and with a clean heap, doing the allocation for merge sort only adds another 1% overhead for both small and large cases. Admittedly since we alloc and free the same size over and over again we're using malloc like a slab allocator, but then that's also the point... allocation speed can be worked around. The second reason is that quicksort has slightly better constant factors. Here I've shown that slightly means ~9-16%. If moves were expensive this might go up a little, but if moves are that expensive you probably shouldn't be directly sorting the data anyway.

Now consider that if you use quicksort your sort will sometimes take N^2 time. That's the sort of thing causes stutters every few seconds or minutes in a videogame, a network stack, etc. 10%-15% is below what's often considered "user noticeable" speed difference (that line usually being drawn around 20%), but they will almost certainly notice the stutter when it takes 100% longer one time.

Conclusion:
Following the philosophy I keep pushing, Merge Sort is probably a better default sort algorithm to use than Quick Sort. Using modern mallocs like tcmalloc allocation time becomes less relevent even with a "dirty" heap. In highly optimized applications dynamic allocation itself is often avoided (since it can cause occasional delays as well), in such cases worst-case is almost always the most critical factor, and additionally it's worth the effort to set the ram aside so being "in-place" isn't that critical.

Eventually I'd really like to microbenchmark some of the algorithms I've been testing so as to actually measure the near-worst-case operation. For now all I have is practical experience and theoretical bounds with which to demonstrate it to others.

Further work:
I'm currently playing with hashtables as well, continuing the tree comparison testing. Of course the hashtable is much faster than my best tree, but I want to pursue some solutions to the hashtable worst-case problems and see how those fair as well.

BHeap algorithm

2016-10-15

I started a personal cruisade some time ago against selecting algorithms based on average time while ignoring worst case runtimes. I've posted about this here before:

Several years ago I was toying with Heaps. The normal model for a heap is that the average run-time is O(log(n)) per operation, and the worst-case is the same. While theoretically true, in practice you rarely know the size of the data you are working with before-hand, and if you are ever wrong you have to allocate more space. Since heaps are by default stored flat in an array this means *copying* the entire heap in to thhe new larger array. This operatin is in fact O(n). Thus, in practice most uses of Heaps are actually worst-case O(n).

Well... that's kind of horrible. So, I tried implementing one as a literal in-memory tree structure instead of an array. I called this a "bounded" heap (since it has a stricter bound). This gets a true worst-case O(log(n)) (assuming allocation time for new nodes is bounded). Unfortunately the performance is abysmal. We're talking 5 or 6 times worse average case, making it a pretty hard sell to convince anyone to use such an algorithm.

So, I got an idea. What if we use the ideas of a Btree in a Heap. That is, allocate chunks of memory and store pieces of the heap in them. A the time I got an outline for an algorithm I call a Bheap (creatively), but I never got around to implementing it.

I finally got it working and benchmarked it recently. Here's an outline for the Bheap algorithm. If you want full details you can just read my implementation, linked at the bottom:

Bheap algorithm:

Lets define a "Node". Each node contains an array. The first half of the array is a heap of data, exactly as normal. The second half is still layed out like a heap, but it's an indirect to other "Nodes", that is heaps. So, in principle it's just one big heap, but broken up in to chunks.

But, there's one catch. If we did this naively we would fill the first node with a bunc hof elements, then we'd allocate it's first child node and put one element there, then we'd allocate the second and put one element there. That's a total waste of memory (wasting approximately 1/arity the memory it uses). So, we modify things to fill the allocated node before it creates a new one... Heaps don't depend on much more than the heap ordering, so nothing is significantly changed.

There's only one more catch. Once we fill the last node we can allocate at a given level, we need to start filling the next level. As an optimization instead off walking down from the root we simply start making a our next child below the current tail. This is an idea I took from my first bounded heap algorithm. To make this work we fill nodes going left, then going right. There's some intracacies to making that work in practice, see the code for details.

Predicted Performance:

This algorithm has 2 neat advantages over a normal heap
1) It does exactly what I planned, allocating a small chunk of data at once, and never having to do a linear time operation... yet it's quite memory efficient. It uses ~2x the memory of a normal heap, due to the pointer indirects, and wastes at worst 1 nodes worth of space... not bad.
2) It should get better locality than a normal heap. Once downside of a normal heap is that a nodes children are located at 2i+1 and 2i+2. That means that after the first few elements all operations are non-local as far as caching goes. This algorithm keeps resetting that every time we go to a new node, so it should peform better cache-wise.

Actual Performance:

Here comes the charts and graphs... but long story short. On my benchmark, with the best possible node-size the Bheap is ~30% slower than a normal heap, and ~4 times faster than the bounded heap. This is actually pretty cool. We've brought the *worst* case down to O(log(n)) with only a 30% loss in practical average case runtime.

I graphed the HeapTime for a whole lot of points just to give an idea of what the variance looks like (lazy mans statistics, I know), but the above chart gives a pretty good clue of where the overriding factors are. In particular it looks like past ~20 elements or so there's no more gains for larger node sizes and constant factors related to the extra BHeap logic become dominant.

I've left out BoundedHeap data because it's just not interesting, it varies from 13 to 15 seconds, that's all there is to know.

So, final roundup looks like:

BoundedHeap: ~14.0s

Bheap: 3.2s

Heap: 2.4s

Analyzing this a bit more. It seems that the caching gains aren't worth all the extra conditionals necessary for the extra logic. That's not too surprising, we still miss cache pretty frequently, just not quite as often.

Also fascinating even at a node-size of 2 this algorithm is outperforming the bounded-heap by ~3.2... That seems wrong. After digging in to the code though I realized that the bounded-heap implementation moves around NODES, while the BHeap implentation moves around DATA. These tests were done using an "int" as the data, on a 64 bit X86 arch under linus... e.g. 32 bit ints and 64 bit pointers. Unsurprisingly when you think about it, swapping two 4 byte ints is faster than swapping the 2 64 bit child pointers and 1 64 bit parent pointer around. That's an interesting lesson in my mind.

If anyone is aware of an algorithm like my Bheap that predates my implementation I'd love to know about it. Given the average runtime is still worse than a normal heap I don't expect anyone to adopt this algorithm anytime soon... but being only 30% worse, it probably would be worth it for a lot of applications. Heaps are usually used for maintaining a "small" list of items.

Note that this work is completely obviated by careful memory management. The dumb solution is to leave space after your array and allocate a page there if needed, but this is quite annoying to implement and gets in the way. Approach #2 though is more practical. A modern malloc will allocate entire pages for large memory requests, thus realloc and the kernel can intelligently re-use those pages, even if they must move the allocation. This means that in practice the Bheap algorithm is likely never worth using in the presence of a paging system, large virtual memory space, and good libraries.

Final notes:

- My Heap implementation uses a standard doubling algorithm. So, if you store pointers in your heap (as is common) the worst-case memory overhead is basically identical to the Bheap.

- I know skiplists are sometimes used in place of heaps in speed-critical applications. I've yet to implement one, or analyze it's theoretical or real-world runtimes. I understand most operations are log(n) bounded, but I'm unclear on the advantages over a heap.

- One other approach to get similar theoretical bounds is to use a normal heap implementation on top of an array-list implementation. Comparing such an approach seems like a good direction for continuing research.

- My implementations and test harness can be found in my git repo here:

In defense of phablets

2015-03-29

Convergence
A while ago I wrote a "manifesto" on mobile devices. My idea at the time was that a mobile device should be a real computer, and anything less and we'd never really be happy with them. Further, I theorized that as computers got faster, eventually we would each carry a device that would serve all our (local) computational needs. This "phone" for lack of a better term would tether to your display and keyboard at your desk. A folding keyboard elsewhere for lack of other input devices, a heads up display if that was desirable etc. This is the ultimate convergence device. One device to rule them all. It's simple, it's small, you can carry it anywhere and interact with it directly, or tether up to a screen at work to get down to business.

At the time what I didn't understand was that it's very few users who care if their data is local or not. To me, the ubiquity of wireless connectivity was no-where near sufficient to warrant putting all my data in the cloud. I often couldn't get it! Some of my data sure, but not all of it. It still isn't anywhere near that level, but what gets built is decided by silicon valley, not by people living deep in the Rockies or in the rural hills of the east. If data is stored "in the cloud" (ugh, I hate that phrase), then there is no reason you *need* all these devices to be the same device. And if you don't need them to be, it's both easier and more lucrative to sell you lots of devices.

All of this is to say, I got that one wrong. Although, at the time I made this prediction desktops were still commonplace, and it is now common to use a laptop that wirelessly tethers to an external display and keyboard, I was at least partly right. Anyway, I don't think my vision was a bad idea, it's just not where industry is going to take us if we sit back and watch.

I hate dealing with 15 devices. I don't like buying them, charging them, storing them, setting them up, etc. The last thing I want is 8 devices to do what one device can do. As a result, I've resisted first smartphones, then netbooks, then E-readers, and then tablets. A laptop can do these things, why push around another device? But I want to do mobile computing. I want to find something that fills those useful niches, but doesn't just add a device.

Mobile computing
I have been privileged enough to get the opportunity to try a lot of different mobile devices over the years. Here, I'm taking laptops for granted, and focusing on things smaller than laptops.

In highschool I had a newton 2000 with a keyboard. That was a really great device. I carried it to school and typed up homework on it. I used it a lot. Back then most of the useful work I did on a computer was word processing, and this actually filled that need quite well. I often wrote papers and printed them off without ever using another device In this sense the Newton was "real computer" in that it allowed full content creation without use of another device (for the needs of the time).

Later on I inherited a Jornada 720. The clamshell form-factor was wonderful, and the 0.75 pitch keyboard was a good size to still be typable, but not good enough for general word processing, especially with the heavy screen tipping the device over all the time. Also Windows CE was useless. I eventually got Linux running on it, but it didn't have enough ram to every really work well with the Linux UI at the time, even for someone like me. It was a great idea, and came very close to being really useful, but missed.

While working at Google they gave me a G1, the first Google smartphone. It was very cool, but I immediately bemoaned it being only kind of a computer. It was a single user system. Word processing on it was almost impossible. It was a phone, and could do maps and such, but it was massively frustrating to know I had that computing power in there but couldn't really get to it. It was neat to own, but I never would've purchased it.

Next they gave me a Nexus 1. This device had a large enough screen to conceivably do real stuff on it, but had no physical keyboard. Eventually they released a version of android that could handle an external bluetooth keyboard and I got one. This gave me a device kindof like a tiny newton. Android still felt like it was making itself a second class citizen, more so than the newton did. I never did "real" work on it because the OS was too hobbled. I eventually got a real gnu userland working on it, but even then the screen was just too small. Close, but this still didn't fill the niche I'd been looking for since I was little, either in form factor or software.

Eventually they gave me a galaxy nexus. Now that was a real change. By this time android had reached it's tween years. It had started to realize that eventually it was going to be a real OS, but had no idea how to get there yet. While I had this device I quit my job at Google and went on the road.Living out of a car the combination of a bluetooth keyboard and the galaxy nexus was incredibly powerful. Plopping down a laptop in a cafe is a statement "I'm going to be here a while", setting a cellphone and a little keyboard down? Not so much. I was able to write emails in internet cafe's when out of cell range up near Yosemite. I didn't need an inverter to charge it. I could charge it over and over off the car battery without worrying about killing the battery, or I could use a solar charger. I wrote a lot of email on that device. It felt kindof like I had a newton again, but with too small a screen. It had grown up enough that I could do real work on it, finally. Still though, it felt hobbled. Posting to a blog wasn't really feasible.

Around then I gave Jess an Asus transformer. It could run Linux dual boot. Sadly while a friend of ours had it working, she never *quite* got it there. This device was a really interesting form-factor. A small and light tablet, and a keyboard. She used it as her primary computing device for a year. This made it painfully clear that Android was indeed not a real OS yet. For example, every browser available for it was highly unstable when you *really* used it (not just poked at it like you would on a phone). There were websites that simply didn't work. Many websites have apps for phones that you can use instead on android, but these were almost always missing critical features. Gmail for example to this day is still missing half the features that are available on the website. For certain things she still needed a "real" computer.

Somewhere in the middle of all that I also got a kindle 3G. This device let me carry around a ton of books with me, yet still read the in full sun, and almost never recharge. It also gave me a backup way to check my email from almost anywhere for free. I rooted it of course, like everything else, but even with console while It's an amazing device, it's just too slow and clunky to use for anything real. I love it, but fundamentally it's not in this class of devices at all, so we'll toss it aside and discuss it no further.

One other thing I discovered though on the road with the galaxy nexus... Why is it a phone? Eventually I realized how much gas I was burning trying to find wifi and purchased a Verizon hotspot to save money. This got me the coverage of Verizon at $50 a month for 5GB, far cheaper than I could get Verizon phone plan (I needed Verizon for the coverage). I had a $2 a day plan (only for days I used it) with T-mobile on my galaxy nexus. After a while though I got data calling working on the hotspot and it was more reliable. At some point, I realized I didn't need my cellphone to be a cellphone. It was just a tiny tablet.

My latest laptop was yet another attempt to hit the middle-ground, but from the top. I got a Dell XPS12, this is a flipscreen convertible laptop that turns in to a tablet. It's a neat idea, but for the most part I haven't used it as a tablet. Only recently did the drivers finally work for gestures on the touchscreen under Linux. I now have a tablet but... I'm not entirely sure what to use a large heavy tablet for. A light tablet is comfortable to hold in one hand and read on, to hold over your head lying on your back, etc. With a large tablet the only use is while standing, but... unless you need a large screen a small light tablet is still better for that use too.

Phablets
Recently, I went to purchase a smartphone. My Galaxy Nexus is bordering on ancient at this point at 4 years old. The screen is shattered, and it's just not working that well. It makes me sad that this is how technology works, but it is. So, I went looking for a new phone. I realized that what I wanted was a 6" tablet... but, those don't exist. 8" is too big, I can't carry it around in a pocket, it would have to be in a backpack. 5" is too small, I can't edit real text and use it like a real device.

So, I got a 6" phone. I actually do have the Tmobile plan on it, for emergencies and getting paged for work. I'd prefer not to have the radio as it uses power. I'd also prefer not to have it since the radio makes the device accessible to the NSA for domestic spying (no, it's not a conspiracy theory, this one is real). But... oh well, 911 support is nice. Overall, paired with a folding keyboard, it's getting really close to what I've been looking for since I was a kid. Android has made it to it's teen years now, it's growing in to a real adult operating system, with multi-user support and all the basics, but it's not actually there yet. It's gotten easier to run a GNU userland as well now, though I'm running a nightly build of a bleeding edge rooted open source image right now (cyanogenmod) so I haven't gotten it working yet.

So, I still have a kindle, a camera (for low-light pictures), a HAM radio, a car, and a verizon hotspot, but for my computing needs I have 2 devices. My Nexus 6 phablet, and a Dell XPS 12 (the convertable laptop).

Both the phablet and the laptop can make phone calls. Both can do text editing, run a terminal, run SSH client and server, run a decent web browser, etc. 2 devices that can basically do everything is getting pretty close (strictly speaking I also have a camera, a HAM radio, and a car, but you get the idea). I find myself grabbing the phablet frequently to use as a reading device, even inside the house. I don't have to plug it in and can just carry it around my 750 square foot house.

The next step is to finish making android into a real OS. Apps either shouldn't exist and the web versions tested on a browser, or should support the full set of features. Alternatively, we could just get really good at getting gnu/linux to run on these devices, so we can actually use the full sweet of software available.

Exciting times. I have a folding keyboard coming in the mail, and I'm excited to try writing blog-posts on the Nexus as soon as it does.

Monitoring: update

2015-03-13

Prometheus appears to actually exist now, and is coming along pretty well.
https://github.com/prometheus/prometheus

Glancing through it, it appears to hit the bulk of my points. It's still lacking a permanent DB for storage, so it's probably not deployable for most folks yet, but in general it looks like they are doing things basically right In particular.

So, there's no answer out there that I think solves the poblem *yet*, but at least someone is getting close. Check out the query language, particularly operators:
http://prometheus.io/docs/querying/basics/

A friend of mine also works at splunk, and they showed me some of splunk's features. As a backing store it appears to fit much of what I was describing as well. It has reasonable group-by semantics for example. Fundamentally splunk is basically a timeseries DB, it just happens to most often be used like ES is for processing logs.

So, with any luck my old monitoring posts will soon become defunct. Here's hoping!

newer posts older posts

Computers Are Hard

Cluster Operations to Type Theory, Algorithms to Security