In all of the recent political turmoil in the U.S. it's easy to get a bit down and depressed about the future. For me, a pick-me-up came from a rather surprising source... a programming language.
From the rust website https://www.rust-lang.org/en-US/
"Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety."
The end result is the first language I've ever seen that doesn't suck. As I read through the Rust book I kept being struck by how intuitive the language is. Now, I should mention that I am a little bit biased, a few of my friends, mostly with similar backgrounds, were fairly deeply involved in it's development. This means the designers have similar biases to mine.
Rust has got to be the most complex language I've ever learned... but then again, I didn't just pick up and start trying to code in C++14 without knowing C and older C++ standards first. The difference is that C++14 and similar languages don't just require learning all the keywords and what they mean, they require learning which code is defined and undefined. Ever try to actually write code that is fully defined? Sequence points are just the start of it. Just check out out the differences between char and int8_t... char (called that because it's frequently used for characters... though does completely the wrong thing with utf8 without serious effort) is assumed to alias something else, and int8_t does not. If any part of that sounded like babble... congratulations, you don't really know C++.
The reality? no-one really knows C++. It's simply too complicated a language with too many corner cases. Corollary? There is no real world software written in C++ that actually conforms to the standard. Conclusion: No real world software written in C++ is even well defined, much less *correct* by any reasonable definition besides "eh... seems to work... today... on this computer and compiler".
With rust on the other hand, while the pointer types might be a little confusing at first, the keyword definitions are all there is to learn. If your code compiles (without unsafe), the behavior is defined, and that's the end of it. No aliasing rules, no sequence points, etc. Almost all of your code can be written like this. For those rare little corners where you really need to punch through that safety, unsafe is there for you. C's semantics got screwed up by optimizing compilers, the problem was that it's definitions are a little *too* low-level (original defined by direct translation to Vax assembly instructions), so optimizing required violating the original rules and we got the crazy dance we have today. Something like SML is so divorced from the system that punching down to understand the machine-level is almost nonsense. Rust is right in between where optomizers can optomize, but the machine layout is defined enough that when you use unsafe, it just works.
It's strange to say, but as I watched the news scroll past and read the Rust book... I felt flushed with hope. Not only can software theoretically not suck, but people actually put together a tool to help us do it. A tool that itself is software that doesn't suck. Maybe, just maybe, humans can actually do this technology thing and make it all work.
This guide will hopefully help you out:
This project was started by a friend of mine. I thought it was a really good idea and started working on it.
The goal here is a complete guide for user-side computer security and privacy. The "Basic" section outlines security for most users. Ideally this should be exactly the thing that technical folks (readers of this blog) would want to hand to their friends and family. It outlines things like password databases, pins on cellphones, etc. If technical folks don't immediately feel the urge to share it with non-technical friends and family upon reading it please let me know. Just that would be very useful feedback. Ways to make the document more approachable, sharable, etc. would be even better.
The "Advanced Topics" section outlines concerns and solutions related to nation state actors. This isn't useful for most people, but the hope is that collecting it all in *one* document will make it a lot easier to pick and choose what any given user does need, and help disseminate this hard to find information more widely.
Note, this isn't a "howto". An intelligent computer user, even one who's not that technical, is entirely capable of Googling howto guides, and things like the settings menu on iphone change to fast to keep up with. Reading this should give a reader an understanding of *what* they need to do, and the technical terminology to look up how.
Any feedback is valuable, as it notes in the document, I would love corrections, improvements, etc.
Note: my work (the link at the top) is a fork of https://github.com/bluehat/privacy_and_freedom/blob/master/digital_freedom.markdown with some significant changes in direction. I filed a merge request today.
I've been writing basically every major datastructure, one at a time.
I wrote up heaps a little while ago: http://www.blog.computersarehard.net/2017/02/a-better-heap.html
I've now finished writing and benchmarking all the common dictionary datastructures.
Note that at every point in this graph the same amount of work is being done. At each point we put "test_size" random elements in to the datastructure, and then remove them. We do this 134217728/test_size times, and time the *total*. Thus we're always putting in and taking out 134217728 elements.
As a result, this graph is showing is how the size of a datastructure impacts it's performance. Note that the graph is logarithmic on the X axis, so it's not completely dominated by the larger tests.
First, lets talk about what each of these algorithms *is*. As a note all of these algorithms resize automatically, both up and down.
- AVL: This is a relatively standard AVL tree with external allocation. I was unable to find an implementation that was correct and balanced, so I rederived some details myself.
- O(log(n) all operations
- RedBlack: This is a highly optimized red-black tree implementation (much more time was spent on it than AVL, due to AVL outperforming it in my testing perviously, I assumed I must have made a mistake).
- O(log(n)) all operations
- BTree: This is just a standard btree implementation, tuned with a fairly efficient arity
- O(log(n)) all operations
- Hashtable: This is a standard open chaining hashtable, using dynamically sized (doubling) arrays for the chaining
- O(N) worst case, O(log(N)) amortized
- OCHashTable: This is a standard open chaining hashtable, using an externally allocated linked-list for the chaining
- O(N) worst case, O(log(N)) amortized
- AVLHashTable: This is a standard open chaining hashtable, using an externally allocated AVL tree for open chaining
- O(N) worst case, O(log(N)) amortized
- BoundedHashTable: This is a giant pile of tricks. It uses an array like datastructure that can be zeroed in constant time. An AVL tree for open chaining. Rehashing on resize is distributed across operations so it rehashes a couple of elements for every op, using a link-list to find each element.
- O(Log(N)) all operations
Algorithms left out
- BtreeHashTable: This performed so poorly I pulled it out of the testing. The reason is obvious, while btree is very fast, it's not good for small datasets, and the chains in a hashtable are always small
You may notice that NONE of these algorithms are even *close* to linear in practice. If every operation is amortized to constant time, as in our hashtable algorithms, the line should be completely *flat*. No more work should be done, just because the datastructure contains more data. This is even more true for the bounded-hashtable, where no operation is *ever* linear, the only reason it's log and not constant even on a per-operation basis is the AVL tree used for chaining.
I spent a while trying to find non-linearities in my testing methodology but came up with nothing. Remember, the X-axis is logarithmic Isn't that odd? If that's throwing you off, here's what it looks like graphed linearly (My data is logarithmic in nature, so the graph is pretty ugly). Whatever that is... it's not linear.
So, what's going on? My best guess is this is the computer's fault. Caching layers, memory management, etc. memmap() probably takes longer and longer to get us memory for malloc for example. I've yet to get detailed enough information to confirm this theory though.
Well... aside from the nonlinearity described above. OCHashtable is the clear overall winner for average runtime at any scale, no big surprise there. BTree is the clear winner for bounded algorithms of large size. AVL and RedBlack are about equivelent for small size... but given in my previous testing AVL came out a little faster, lookups should theoretically be a little faster, the implementation tested here is less optimized than red-black, and an order of magnitude simpler to code, AVL clearly beats RedBlack (as is now known generally).
This is pretty much what we would expect. I had high-hopes for BoundedHashTable, as *theoretically* it is very fast... but the constant factors seem to blow it out of the water, and it still shows up as very much non-linear. This algorithm is unable to resize arrays (as realloc zeros, which is linear), this means constantly allocating new differently sized arrays. I suspect this along with the constant factors due to algorithmic complexity is probably the cause of poor performance.
As always, full source is available here: https://github.com/multilinear/datastructures_C--11
Comparing sorting algorithms isn't terribly original, but I didn't think comparing AVL trees to RB trees was either, so I thought I'd do it anyway and see what the real results were.
Quick refresher. Most computer scientists know big O notation, but we tend to forget big Omega and big Theta. Big O is the upper bound, big Omega is the analogous lower bound and big Theta is used when the two are the same. Got that? I often hear people state a "best case" as "big O of", but I want to promulgate correct usage.
In testing, quicksort was of course the fastest, mergesort next and heapsort last. Though I wrote selection and bubble as they have their own uses, I didn't even consider \Omega(N^2) algorithms for speed testing. Just as a reminder before we look at the results here's the boundaries and properties for each:
- Quick Sort
- In place, not stable
- Merge Sort
- Not in place, stable
- Heap Sort
- In place, not stable
But, lets talk some real numbers. I did two tests, one with 10000000 element sort run 100 times, and one with 100 element sort run 10000000 times. I'll call the first "large" sorts, and the second "small". I did a little more testing to confirm that the results are relatively stable within those intuitive categories.
- Quick Sort:
- Fastest in both cases
- Merge Sort
- Large test: 16% slower
- Small test: 9% slower
- Heap sort
- Large test: 122% slower
- Small test: 19% slower
Okay, so these were basically the results we all expected right? There are a couple of interesting details though.
First, because it just jumps out at you, what the heck is with heapsort? It certainly does more operations than the other two, but that wouldn't account for the difference between small and large. My guess is that as the heap spreads out basically every lookup in the array is a cache-miss, this is what bheap was attempting to improve for a normal heap algorithm, but the constant factors came out even worse.
Now, lets talk about the two algorithms who's speed don't immediatly knock them out of the running. Their are two commonly cited reasons for using Quick Sort over Merge Sort. The first is that it's in place... I did some further testing and on my machine (a modern linux distro), and with a clean heap, doing the allocation for merge sort only adds another 1% overhead for both small and large cases. Admittedly since we alloc and free the same size over and over again we're using malloc like a slab allocator, but then that's also the point... allocation speed can be worked around. The second reason is that quicksort has slightly better constant factors. Here I've shown that slightly means ~9-16%. If moves were expensive this might go up a little, but if moves are that expensive you probably shouldn't be directly sorting the data anyway.
Now consider that if you use quicksort your sort will sometimes take N^2 time. That's the sort of thing causes stutters every few seconds or minutes in a videogame, a network stack, etc. 10%-15% is below what's often considered "user noticeable" speed difference (that line usually being drawn around 20%), but they will almost certainly notice the stutter when it takes 100% longer one time.
Following the philosophy I keep pushing, Merge Sort is probably a better default sort algorithm to use than Quick Sort. Using modern mallocs like tcmalloc allocation time becomes less relevent even with a "dirty" heap. In highly optimized applications dynamic allocation itself is often avoided (since it can cause occasional delays as well), in such cases worst-case is almost always the most critical factor, and additionally it's worth the effort to set the ram aside so being "in-place" isn't that critical.
Eventually I'd really like to microbenchmark some of the algorithms I've been testing so as to actually measure the near-worst-case operation. For now all I have is practical experience and theoretical bounds with which to demonstrate it to others.
I'm currently playing with hashtables as well, continuing the tree comparison testing. Of course the hashtable is much faster than my best tree, but I want to pursue some solutions to the hashtable worst-case problems and see how those fair as well.