Replacing Google

2018-11-24

I recently moved in to an actual non-moving apartment (if you don't generally follow me, I lived in a pickup truck for about 3 years, traveling the country).

Anyway, this afforded me the opportunity to work on a tech project I've been meaning to tackle for some time. I've tried it before, but I wanted to take another whack at it. This project is *replacing google*. That is, finding alternatives to every Google product I use in my life.

Anyone who would read this blog already knows the reasons one might want to do this, so I won't bore you with a long rant on that, and just get on to the technical stuff. I will say quickly that security/privacy are my major concerns and given those concerns switching off Google's products isn't always a clear win. This will come up later.

Let me start with a list of Google products that I was using:

It's a long list... Try making one yourself, it's amazing what we use without even thinking about it. So, I started looking for alternatives for each of these products. Switching is a pain, and just switching to a different giant corporation doesn't really accomplish much, so to be worth switching a product would need to be open-source and self-hosted or at least known to be far more ethical. I'm going to go approximately from easiest to hardest to switch.

Just switch
Search
This one is easy. DuckDuckGo is an option on virtually every browser, just go to settings and switch it. The nice thing about this choice is that if you don't find what you want, you can always go to google.com anyway. At least you're only doing a few searches there instead of all of them.

Maps

For desktop use https://www.openstreetmap.org is a so-so alternative. It works pretty well for looking up distances and routes and such. What it lacks is good search functionality and as much info about local businesses. On mobile I've been using MapsWithMe for some time. It works well offline (unlike Google maps), but can't do address searches, which can be an issue.

Chrome Browser

Another easy one. I switched to Firefox. There are other options like DuckDuckGo, but firefox has the best support, usability, and security, with the least spying. This is a place where you have to consider your tradeoffs, if all of your browsing data goes straight to various companies you didn't help yourself much. I hope to write an article on browser security/privacy at some point as well.

Hangouts
Ignoring network effects (which are huge, I realize), just switch to signal and call it good. Signal supports all of the things you need (text, voice, video), and is the most secure option available. They just changed the protocol in fact to make it even harder for them to comply with any subpeana demands.
Personally, I have some issues though. Signal is available for basically every platform... but they didn't roll out a chrome OS version prior to the protocol switch, so Angie (my wife) can't use it on her laptop (Yes, it's Google as well). Additionally I have a Samsung Galaxy Tab A 7" which is stuck on Android 5 and the new Signal version wasn't backported (I can't blame them really). These are both fairly unique and weird requirements though, I recognize that... for 99.9% of people, Signal is your solution.


Hard: Self Hosted Services

Everything below is self hosted on my own machine, so I need a good connection, a domain name, port forwarding, and a good hosting machine which in my world means Linux. I'm on comcast business cable, and I got a Google wifi access point (yes... also a Google product). I set the cable router to be a bridge so the Google wifi is the endpoint/NAT, then set up port-forwarding through that. Finally I don't have a static IP. After a little research I decided on duckdns for dyn-dns. Their tooling is open and the price (free) is good. My machine is Debian, because I find it easier to run and administer than having to remove all the junk Ubuntu comes with (even on the server installs).
For the moment I'm serving off my primary laptop, since that's what I have. If my laptop is at home, I can reach it from a coffee shop on my tablet, if I have it with me, I can reach services using localhost. It's not ideal, but it works until I get another machine (I'm cheap).

Email
As a techie, my solution is to host my own email. I did this some years ago, and gave up because you basically can't use email with google's email filtering these days. So, this time I compromised. My email goes to gmail. I use fetchmail to pop it to my laptop (using a .fetchmailrc style config), deleting the email from gmail. then I use dovecot to run an imap server, and finally thunderbird on my laptop and k9mail on my tablet. All in all it works extremely well, plus I get pgp support and all of those things too. If my machine is down, email sits in gmail until I can pop it, so if I screw up I don't miss email. Honestly, this setup was significantly easier than I expected.

Webmail
If you want webmail, this one is comparatively easy. I'm not running either right now because I don't care for webmail anyway, but I ran roundcube for a while and it worked fine. squirrelmail is an option, and if you're using NextCloud (see my next section) it has an email cient built in as well. Search won't be as good as gmail, but that's about the only flaw.

Contacts
Surprisingly after researching all the software out there there are no good standalone carddav sync servers out there. I found several options, none of which appeared stable enough and well enough designed to actually use.
So, I opted for a heavyweight solution instead. I installed NextCloud.  So, to set this up the easiest way was using snapd. I installed it with apt, then used it to install NextCloud. The result is a bit odd. NextCloud gets installed in /snap, and the files aren't writable. To configure nextcloud you use the command "nextcloud.occ" located in /snap/bin (which you'll want to add to your root path). That done everything worked like a charm. A few config commands later and I had NextCloud running with ssl keys from lets-encrypt.
Since it's CardDav, basically everything supports it. I installed tbsync on thunderbird and that worked great. On my android (yup, another google product), I installed DavDroid, but you can find similar syncing apps for virutally any OS... that's the great thing about standards.


Calendar
The nice thing about the above solution, is that I get the calendar for free. Just enable it in the options, and hey, calendar! This setup *almost* interacts cleanly with Google calendar too, so you don't have to convince others to switch so you can. Pulling Calendars from google for viewing works quite well actually (and thunderbird gives you editing). Unfortunately though, Google's calDav integration is abysmal and it may be days until an update to your calendar is visible on Google. So you can't share your calendar with folks who use Google calendar... Bummer.
Syncing is CalDav based, so clients are again easy to come by. On my laptop I'm using thunderbird lighting as my client, a thunderbird calendar plugin. There is ONE little issue though. There's a bug where you *must* turn off cookies to make it work, no idea why, the developers don't know yet either (but they do know about it). On Android again DavDroid does the trick.

Drive
This is barely worth mentioning after the above. This is NextCloud's core feature. Just install the NextCloud client on each of your devices and hook it up. I'm using it for syncing my password database (But not my keyfile, which I keep distinctly on each device).

Too hard: Self Hosted services with lots of setup

Docs
Sheets
It turns out that there are a couple of products that replace these products elegantly. Callabora and OnlyOffice both seem to get good reviews and they link really cleanly with NextCloud. The problem? Hosting them is a PITA. Both of them seem to be easiest to host in a docker container, and everyone's advice is to run them on a separate machine from NextCloud if you want things to actually work. Is it possible to get these working on your little home server? Yes, I'm certain that it is, but 3 or 4 days of hacking is a LOT just for these 2 products.

Note though that if you don't require online collaboration/editing you can use some good desktop options and just share the files. abiword and gnumeric are good for most use-cases and libreoffice is there for when they aren't. For me, as i mentioned earlier, Angie uses a Chromebook, so a web-editor is a must for me to switch off of docs/sheets.

No good replacements
Google Voice + Hangouts dialer
SIP is out there and works well. I'm using diamondcard.com to get the phone number. For now I'm using it throug* Google voice while I experiment with things and get it all reliable (which means Google still know about every call I make and recieve).On my laptop (Linux) I'm using linphone. Surprisingly out of many many sip clients out there for Linux there are only a couple that aren't basically abandon-ware. Android phone's have sip built in actually, just go to the phone app and then go to the settings. You can install the phone app even on a tablet.
Okay, so then why do I have this listed under "No good replacements". The problem is having *all* the devices ring. This is possible with sip if you pay a decent amount of money, but it's not possible to do for free or almost free. Diamondcard is like $3.00 a month, which is nearly free. I can get another line for my tablet, no problem. Making them both ring though means I need a hosted PBX, and the prices for that are not in the same ballpark.

Keep
There are literally hundreds of todo list, task, and notetaking apps out there. But lets look at Keeps core features
  1. It handles multiple notes elegantly, letting you look at all of them. Lists are a native feature, a note is a list or not a list, making the app extremely usable without fumbling for weird format charactors.
  2. It's avalable on all devices, laptops included
  3. It's available offline on mobile devices
  4. It syncs between users
The second through fourth criteria, surprisingly, are the easy ones to meet. There are tons of apps out there that sync, particularly if you're willing to use a third-party for hosting. The second criteria is a little limiting (lots of mobile only apps), but there are still options. The problem is the first criteria. There are a few apps that have native lists and don't require typing weird formatting characters on your tiny screen while in the grocery store. While that might be okay for me, it's not for my wife, with whom I need to share a grocery list

Android
You can buy an android device and put another OS on it, but in practice this isn't really workable. It's too much effort even for total nerds like me. You could buy an Android from someone besides Google, but then you just get worse security and no updates (like my Samsung tab A 7").
Right now I'm not aware of any good alternatives. I use my tablet for GPS (trails and roads), as a phone when on Wifi, for shopping lists in the grocery store, as a Newsreader, and a guitar tuner, as well as a laptop replacement when I left my laptop at home.
I've pre-ordered the librem 5 from purism for when it comes out, hoping that will be a reasonable alternative. But, for now a tiny laptop device is probably the best bet honestly.

Blogger
I probably haven't looked hard enough yet. Wordpress is out there, but it's security is abysmal (just look at any high-profile wordpress blog and how often it gets owned, often serving malicious content for a while unbenounced to the owner). There are a lot of heavy-weight solutions that just seem totally unnecessary. There are a lot of hosted solutions I'd have to pay for. There are a lot of lightweight solutions that appear not to work, or a so lightweight you can't tweak the look/feel at all.
I've seriously considered just writing my own software for this problem. The comment feature is rarely used anyway and then I can generate it statically using some extensions to code I've already written. In the end that may be the best direction.

Google Wifi
I mentioned earlier that I have a Google Wifi, which seems rather silly when I'm writing a blog about trying to move off Google. I actually just bought this product too. Here's why.
Wifi and router security is abysmal. Whatever device you use for NAT is the easiest device to attack from the internet, so it's the most important that it's secure. These devices are usually... well... not. Most users don't know or care if their device is secure, and have no idea when it's been rooted. As a result home routers are becoming a major source for botnets used to launch DDoS attacks, steel users information, etc. If a hacker is in your router, they can watch everything you do on the web. https ought to protect you, but it's imperfect, particularly if you're worried about so-called nation-state actors (think, any hacking group with some non-trivial resources).
Enter Google Wifi. They seem to actually care about security. They issue updates regularly which most options don't. Also the device updates itself without me having to log in to the router every couple of days and see if there's an update that needs to be installed.

I'm not going to bother listing ChromeOS because the alternatives are obvious.

Conclusion
I care about this issue a fair bit. I'm willing to go through some hassle, and even give up some functionality to make it all work. I've managed to switch off of about half of Google's products that I was using, but I find it not worth my time or money (so far) to switch off of the other half. If you have no-one else that you collaborate with regularly (friends, relatives, co-workers, or a spouse) that uses these products then you'll find the migration much easier. A lot of the lock-in occurs due to network effects which is quite frustrating. Also if you never used the collaboration features of say, docs and sheets, then switching is quite painless. Google is successful enough that their integration features vary from non-existant to poorly maintained.

So, why is this the state of the world? It's incredibly obvious, but for some reason I never realized it until I was working on this project. Good software, the best software, has to pay someone's salary. That is the only way to get enough high-quality dedicated maintainers to build solid software. Because of this the better products in the FOSS (Free and Open Source software) ecosystem are driven largely by Open Source corporations. These corporations make their money by doing hosting and support. So while the core of the software is FOSS, the part that makes it easy to run is proprietary, because that's how they make their money. End result? Lots of hobbyist products that are missing the 3 crucial features you need to make them useful (often these are things like, reliability, or actually compiling), and a few great products that are nigh-impossible to run.

In working on this I tried out a number of other options that aren't listed above, and consistently ran in poorly maintained or unmaintained software. See the discussion of replacing google Voice and SIP. Most of the clients are broken in one way or another. But, I have to admit... I'm not willing to pay, so who is?

I'll continue to work on this (like maybe getting a docs/sheets replacement online), and try and update here when I find more useful information. What I am doing, and have been doing for a while, is trying to use these products *less*. I can straight delete, or download and delete old documents and spreadsheets that don't need to be shared anymore. I can use gmail just as a caching endpoint and spam filter, and not to store my email. etc.


Rust gives me hope for the future

2018-11-16

In all of the recent political turmoil in the U.S. it's easy to get a bit down and depressed about the future. For me, a pick-me-up came from a rather surprising source... a programming language.

Now, anyone reading this post is probably enough of a computer nerd that computers are not a source of hope for the future... they are a source of the exact opposite. No computer expert can look at a programming language and not get depressed at every flaw it has. Just google "Javascript flaws" and you'll find diatribe after diatribe. C's flaws have been elevated to interview questions, I myself used to ask "what are the semantics of x++"... which would take literally 15 minutes to answer correctly. Ask a type theorist about Java's flawed generics and you'll get an hour lecture on how the designers confused top a bottom, contravariance, and why sub-typing of objects (much less generics) is a horrible idea. Alternatively, while Haskell appears to get few things *wrong* you need a PhD in category theory to understand it and like most languages that aren't fundamentally broken internally it gets relegated to the category of "useless toy".

Enter Rust


From the rust website https://www.rust-lang.org/en-US/
"Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety."
Sounds like what they claimed about Java doesn't it? Well, it is, but rust actually lives up to it. Rust is aiming to be a direct competitor to C/C++. Remember that C was originally an "easier portable assembly code". As software engineers we tend to think of it as being nearly abstraction free, but actually what it has are a lot of nearly zero cost abstractions. These days only a few experts with a lot of time on their hands can actual beat modern C compilers in writing performant assembly code. Something only worth-while for a few special edge-case uses (like matrix convolutions). Rust takes this idea, but combine it with everything computer scientists, and type theorists in particular, have discovered about type theory since the invention of C.

The end result is the first language I've ever seen that doesn't suck. As I read through the Rust book I kept being struck by how intuitive the language is. Now, I should mention that I am a little bit biased, a few of my friends, mostly with similar backgrounds, were fairly deeply involved in it's development. This means the designers have similar biases to mine.

Rust has got to be the most complex language I've ever learned... but then again, I didn't just pick up and start trying to code in C++14 without knowing C and older C++ standards first. The difference is that C++14 and similar languages don't just require learning all the keywords and what they mean, they require learning which code is defined and undefined. Ever try to actually write code that is fully defined? Sequence points are just the start of it. Just check out out the differences between char and int8_t... char (called that because it's frequently used for characters... though does completely the wrong thing with utf8 without serious effort) is assumed to alias something else, and int8_t does not. If any part of that sounded like babble... congratulations, you don't really know C++.

The reality? no-one really knows C++. It's simply too complicated a language with too many corner cases. Corollary? There is no real world software written in C++ that actually conforms to the standard. Conclusion: No real world software written in C++ is even well defined, much less *correct* by any reasonable definition besides "eh... seems to work... today... on this computer and compiler".

With rust on the other hand, while the pointer types might be a little confusing at first, the keyword definitions are all there is to learn. If your code compiles (without unsafe), the behavior is defined, and that's the end of it. No aliasing rules, no sequence points, etc. Almost all of your code can be written like this. For those rare little corners where you really need to punch through that safety, unsafe is there for you.  C's semantics got screwed up by optimizing compilers, the problem was that it's definitions are a little *too* low-level (original defined by direct translation to Vax assembly instructions), so optimizing required violating the original rules and we got the crazy dance we have today. Something like SML is so divorced from the system that punching down to understand the machine-level is almost nonsense. Rust is right in between where optomizers can optomize, but the machine layout is defined enough that when you use unsafe, it just works.

It's strange to say, but as I watched the news scroll past and read the Rust book... I felt flushed with hope. Not only can software theoretically not suck, but people actually put together a tool to help us do it. A tool that itself is software that doesn't suck. Maybe, just maybe, humans can actually do this technology thing and make it all work.

A User's guide to Privacy and Security

2017-10-05

Have you ever wanted a document to hand to your friends and family outlining basic computer security so you can stop helping them recover their email account?

Are you that friend or family member?

Have you ever wanted to leak information to the press from a government agency without ending up in jail?
This guide will hopefully help you out:
https://github.com/multilinear/privacy_and_freedom/blob/master/user_security.markdown

This project was started by a friend of mine. I thought it was a really good idea and started working on it.

The goal here is a complete guide for user-side computer security and privacy. The "Basic" section outlines security for most users. Ideally this should be exactly the thing that technical folks (readers of this blog) would want to hand to their friends and family. It outlines things like password databases, pins on cellphones, etc. If technical folks don't immediately feel the urge to share it with non-technical friends and family upon reading it please let me know. Just that would be very useful feedback. Ways to make the document more approachable, sharable, etc. would be even better.

The "Advanced Topics" section outlines concerns and solutions related to nation state actors. This isn't useful for most people, but the hope is that collecting it all in *one* document will make it a lot easier to pick and choose what any given user does need, and help disseminate this hard to find information more widely.

Note, this isn't a "howto". An intelligent computer user, even one who's not that technical, is entirely capable of Googling howto guides, and things like the settings menu on iphone change to fast to keep up with. Reading this should give a reader an understanding of *what* they need to do, and the technical terminology to look up how.

Any feedback is valuable, as it notes in the document, I would love corrections, improvements, etc.

Note: my work (the link at the top) is a fork of https://github.com/bluehat/privacy_and_freedom/blob/master/digital_freedom.markdown with some significant changes in direction. I filed a merge request today.

Small update on datastructure benchmarks

2017-04-29


I hadn't written a skiplist yet. So here's the same graph but with a randomized skiplist added in... Notice that it's pretty horrible anyway.

Benchmark of all major dictionary structures

2017-04-20

I've been writing basically every major datastructure, one at a time.
I wrote up heaps a little while ago: http://www.blog.computersarehard.net/2017/02/a-better-heap.html
I've now finished writing and benchmarking all the common dictionary datastructures.
Note that at every point in this graph the same amount of work is being done. At each point we put "test_size" random elements in to the datastructure, and then remove them. We do this 134217728/test_size times, and time the *total*. Thus we're always putting in and taking out 134217728 elements.

As a result, this graph is showing is how the size of a datastructure impacts it's performance. Note that the graph is logarithmic on the X axis, so it's not completely dominated by the larger tests.


First, lets talk about what each of these algorithms *is*. As a note all of these algorithms resize automatically, both up and down.


Algorithms left out

Surprising results:

You may notice that NONE of these algorithms are even *close* to linear in practice. If every operation is amortized to constant time, as in our hashtable algorithms, the line should be completely *flat*. No more work should be done, just because the datastructure contains more data. This is even more true for the bounded-hashtable, where no operation is *ever* linear, the only reason it's log and not constant even on a per-operation basis is the AVL tree used for chaining.

I spent a while trying to find non-linearities in my testing methodology but came up with nothing. Remember, the X-axis is logarithmic Isn't that odd? If that's throwing you off, here's what it looks like graphed linearly (My data is logarithmic in nature, so the graph is pretty ugly). Whatever that is... it's not linear.



So, what's going on? My best guess is this is the computer's fault. Caching layers, memory management, etc. memmap() probably takes longer and longer to get us memory for malloc for example. I've yet to get detailed enough information to confirm this theory though.

Conclusion
Well... aside from the nonlinearity described above. OCHashtable is the clear overall winner for average runtime at any scale, no big surprise there. BTree is the clear winner for bounded algorithms of large size. AVL and RedBlack are about equivelent for small size... but given in my previous testing AVL came out a little faster, lookups should theoretically be a little faster, the implementation tested here is less optimized than red-black, and an order of magnitude simpler to code, AVL clearly beats RedBlack (as is now known generally).

This is pretty much what we would expect. I had high-hopes for BoundedHashTable, as *theoretically* it is very fast... but the constant factors seem to blow it out of the water, and it still shows up as very much non-linear. This algorithm is unable to resize arrays (as realloc zeros, which is linear), this means constantly allocating new differently sized arrays. I suspect this along with the constant factors due to algorithmic complexity is probably the cause of poor performance.

As always, full source is available here: https://github.com/multilinear/datastructures_C--11