A User's guide to Privacy and Security
2017-10-05
This guide will hopefully help you out:
This project was started by a friend of mine. I thought it was a really good idea and started working on it.
The goal here is a complete guide for user-side computer security and privacy. The "Basic" section outlines security for most users. Ideally this should be exactly the thing that technical folks (readers of this blog) would want to hand to their friends and family. It outlines things like password databases, pins on cellphones, etc. If technical folks don't immediately feel the urge to share it with non-technical friends and family upon reading it please let me know. Just that would be very useful feedback. Ways to make the document more approachable, sharable, etc. would be even better.
The "Advanced Topics" section outlines concerns and solutions related to nation state actors. This isn't useful for most people, but the hope is that collecting it all in *one* document will make it a lot easier to pick and choose what any given user does need, and help disseminate this hard to find information more widely.
Note, this isn't a "howto". An intelligent computer user, even one who's not that technical, is entirely capable of Googling howto guides, and things like the settings menu on iphone change to fast to keep up with. Reading this should give a reader an understanding of *what* they need to do, and the technical terminology to look up how.
Any feedback is valuable, as it notes in the document, I would love corrections, improvements, etc.
Note: my work (the link at the top) is a fork of https://github.com/bluehat/privacy_and_freedom/blob/master/digital_freedom.markdown with some significant changes in direction. I filed a merge request today.
Small update on datastructure benchmarks
2017-04-29
Benchmark of all major dictionary structures
2017-04-20
I've been writing basically every major datastructure, one at a time.
I wrote up heaps a little while ago:
http://www.blog.computersarehard.net/2017/02/a-better-heap.html
I've now finished writing and benchmarking all the common dictionary datastructures.
Note that at every point in this graph the same amount of work is being done. At each point we put "test_size" random elements in to the datastructure, and then remove them. We do this 134217728/test_size times, and time the *total*. Thus we're always putting in and taking out 134217728 elements.
As a result, this graph is showing is how the size of a datastructure impacts it's performance. Note that the graph is logarithmic on the X axis, so it's not completely dominated by the larger tests.
First, lets talk about what each of these algorithms *is*. As a note all of these algorithms resize automatically, both up and down.
- AVL: This is a relatively standard AVL tree with external allocation. I was unable to find an implementation that was correct and balanced, so I rederived some details myself.
- O(log(n) all operations
- RedBlack: This is a highly optimized red-black tree implementation (much more time was spent on it than AVL, due to AVL outperforming it in my testing perviously, I assumed I must have made a mistake).
- O(log(n)) all operations
- BTree: This is just a standard btree implementation, tuned with a fairly efficient arity
- O(log(n)) all operations
- Hashtable: This is a standard open chaining hashtable, using dynamically sized (doubling) arrays for the chaining
- O(N) worst case, O(log(N)) amortized
- OCHashTable: This is a standard open chaining hashtable, using an externally allocated linked-list for the chaining
- O(N) worst case, O(log(N)) amortized
- AVLHashTable: This is a standard open chaining hashtable, using an externally allocated AVL tree for open chaining
- O(N) worst case, O(log(N)) amortized
- BoundedHashTable: This is a giant pile of tricks. It uses an array like datastructure that can be zeroed in constant time. An AVL tree for open chaining. Rehashing on resize is distributed across operations so it rehashes a couple of elements for every op, using a link-list to find each element.
- O(Log(N)) all operations
Algorithms left out
- BtreeHashTable: This performed so poorly I pulled it out of the testing. The reason is obvious, while btree is very fast, it's not good for small datasets, and the chains in a hashtable are always small
Surprising results:
You may notice that NONE of these algorithms are even *close* to linear in practice. If every operation is amortized to constant time, as in our hashtable algorithms, the line should be completely *flat*. No more work should be done, just because the datastructure contains more data. This is even more true for the bounded-hashtable, where no operation is *ever* linear, the only reason it's log and not constant even on a per-operation basis is the AVL tree used for chaining.
I spent a while trying to find non-linearities in my testing methodology but came up with nothing. Remember, the X-axis is logarithmic Isn't that odd? If that's throwing you off, here's what it looks like graphed linearly (My data is logarithmic in nature, so the graph is pretty ugly). Whatever that is... it's not linear.
So, what's going on? My best guess is this is the computer's fault. Caching layers, memory management, etc. memmap() probably takes longer and longer to get us memory for malloc for example. I've yet to get detailed enough information to confirm this theory though.
Conclusion
Well... aside from the nonlinearity described above. OCHashtable is the clear overall winner for average runtime at any scale, no big surprise there. BTree is the clear winner for bounded algorithms of large size. AVL and RedBlack are about equivelent for small size... but given in my previous testing AVL came out a little faster, lookups should theoretically be a little faster, the implementation tested here is less optimized than red-black, and an order of magnitude simpler to code, AVL clearly beats RedBlack (as is now known generally).
This is pretty much what we would expect. I had high-hopes for BoundedHashTable, as *theoretically* it is very fast... but the constant factors seem to blow it out of the water, and it still shows up as very much non-linear. This algorithm is unable to resize arrays (as realloc zeros, which is linear), this means constantly allocating new differently sized arrays. I suspect this along with the constant factors due to algorithmic complexity is probably the cause of poor performance.
As always, full source is available here: https://github.com/multilinear/datastructures_C--11
Sort tests
2017-02-14
Comparing sorting algorithms isn't terribly original, but I didn't think comparing AVL trees to RB trees was either, so I thought I'd do it anyway and see what the real results were.
Quick refresher. Most computer scientists know big O notation, but we tend to forget big Omega and big Theta. Big O is the upper bound, big Omega is the analogous lower bound and big Theta is used when the two are the same. Got that? I often hear people state a "best case" as "big O of", but I want to promulgate correct usage.
In testing, quicksort was of course the fastest, mergesort next and heapsort last. Though I wrote selection and bubble as they have their own uses, I didn't even consider \Omega(N^2) algorithms for speed testing. Just as a reminder before we look at the results here's the boundaries and properties for each:
- Quick Sort
- O(N^2)
- \Omega(Nlog(N))
- In place, not stable
- Merge Sort
- \Theta(Nlog(N))
- Not in place, stable
- Heap Sort
- \Theta(Nlog(N))
- In place, not stable
But, lets talk some real numbers. I did two tests, one with 10000000 element sort run 100 times, and one with 100 element sort run 10000000 times. I'll call the first "large" sorts, and the second "small". I did a little more testing to confirm that the results are relatively stable within those intuitive categories.
- Quick Sort:
- Fastest in both cases
- Merge Sort
- Large test: 16% slower
- Small test: 9% slower
- Heap sort
- Large test: 122% slower
- Small test: 19% slower
Okay, so these were basically the results we all expected right? There are a couple of interesting details though.
First, because it just jumps out at you, what the heck is with heapsort? It certainly does more operations than the other two, but that wouldn't account for the difference between small and large. My guess is that as the heap spreads out basically every lookup in the array is a cache-miss, this is what bheap was attempting to improve for a normal heap algorithm, but the constant factors came out even worse.
Now, lets talk about the two algorithms who's speed don't immediatly knock them out of the running. Their are two commonly cited reasons for using Quick Sort over Merge Sort. The first is that it's in place... I did some further testing and on my machine (a modern linux distro), and with a clean heap, doing the allocation for merge sort only adds another 1% overhead for both small and large cases. Admittedly since we alloc and free the same size over and over again we're using malloc like a slab allocator, but then that's also the point... allocation speed can be worked around. The second reason is that quicksort has slightly better constant factors. Here I've shown that slightly means ~9-16%. If moves were expensive this might go up a little, but if moves are that expensive you probably shouldn't be directly sorting the data anyway.
Now consider that if you use quicksort your sort will sometimes take N^2 time. That's the sort of thing causes stutters every few seconds or minutes in a videogame, a network stack, etc. 10%-15% is below what's often considered "user noticeable" speed difference (that line usually being drawn around 20%), but they will almost certainly notice the stutter when it takes 100% longer one time.
Conclusion:
Following the philosophy I keep pushing, Merge Sort is probably a better default sort algorithm to use than Quick Sort. Using modern mallocs like tcmalloc allocation time becomes less relevent even with a "dirty" heap. In highly optimized applications dynamic allocation itself is often avoided (since it can cause occasional delays as well), in such cases worst-case is almost always the most critical factor, and additionally it's worth the effort to set the ram aside so being "in-place" isn't that critical.
Eventually I'd really like to microbenchmark some of the algorithms I've been testing so as to actually measure the near-worst-case operation. For now all I have is practical experience and theoretical bounds with which to demonstrate it to others.
Further work:
I'm currently playing with hashtables as well, continuing the tree comparison testing. Of course the hashtable is much faster than my best tree, but I want to pursue some solutions to the hashtable worst-case problems and see how those fair as well.
BHeap algorithm
2016-10-15
I started a personal cruisade some time ago against selecting algorithms based on average time while ignoring worst case runtimes. I've posted about this here before:
Several years ago I was toying with Heaps. The normal model for a heap is that the average run-time is O(log(n)) per operation, and the worst-case is the same. While theoretically true, in practice you rarely know the size of the data you are working with before-hand, and if you are ever wrong you have to allocate more space. Since heaps are by default stored flat in an array this means *copying* the entire heap in to thhe new larger array. This operatin is in fact O(n). Thus, in practice most uses of Heaps are actually worst-case O(n).
Well... that's kind of horrible. So, I tried implementing one as a literal in-memory tree structure instead of an array. I called this a "bounded" heap (since it has a stricter bound). This gets a true worst-case O(log(n)) (assuming allocation time for new nodes is bounded). Unfortunately the performance is abysmal. We're talking 5 or 6 times worse average case, making it a pretty hard sell to convince anyone to use such an algorithm.
So, I got an idea. What if we use the ideas of a Btree in a Heap. That is, allocate chunks of memory and store pieces of the heap in them. A the time I got an outline for an algorithm I call a Bheap (creatively), but I never got around to implementing it.
I finally got it working and benchmarked it recently. Here's an outline for the Bheap algorithm. If you want full details you can just read my implementation, linked at the bottom:
Bheap algorithm:
Lets define a "Node". Each node contains an array. The first half of the array is a heap of data, exactly as normal. The second half is still layed out like a heap, but it's an indirect to other "Nodes", that is heaps. So, in principle it's just one big heap, but broken up in to chunks.
But, there's one catch. If we did this naively we would fill the first node with a bunc hof elements, then we'd allocate it's first child node and put one element there, then we'd allocate the second and put one element there. That's a total waste of memory (wasting approximately 1/arity the memory it uses). So, we modify things to fill the allocated node before it creates a new one... Heaps don't depend on much more than the heap ordering, so nothing is significantly changed.
There's only one more catch. Once we fill the last node we can allocate at a given level, we need to start filling the next level. As an optimization instead off walking down from the root we simply start making a our next child below the current tail. This is an idea I took from my first bounded heap algorithm. To make this work we fill nodes going left, then going right. There's some intracacies to making that work in practice, see the code for details.
Predicted Performance:
This algorithm has 2 neat advantages over a normal heap
1) It does exactly what I planned, allocating a small chunk of data at once, and never having to do a linear time operation... yet it's quite memory efficient. It uses ~2x the memory of a normal heap, due to the pointer indirects, and wastes at worst 1 nodes worth of space... not bad.
2) It should get better locality than a normal heap. Once downside of a normal heap is that a nodes children are located at 2i+1 and 2i+2. That means that after the first few elements all operations are non-local as far as caching goes. This algorithm keeps resetting that every time we go to a new node, so it should peform better cache-wise.
I graphed the HeapTime for a whole lot of points just to give an idea of what the variance looks like (lazy mans statistics, I know), but the above chart gives a pretty good clue of where the overriding factors are. In particular it looks like past ~20 elements or so there's no more gains for larger node sizes and constant factors related to the extra BHeap logic become dominant.
I've left out BoundedHeap data because it's just not interesting, it varies from 13 to 15 seconds, that's all there is to know.