Designing for correctness

2025-07-27

There aren't nearly enough abstract discussions about how to design code for correctness, but there are even fewer that talk about concrete examples. Having just posted Tasx, I thought it might be interesting to use it as an example of how to think about designing for correctness. I'm sure I didn't do everything right, and I'd love for people to tell me how I'm wrong so I can make it better, but a lot of thought went into how to avoid bugs. So lets talk about some principles and how I applied them.

Make failures obvious

One big trick to designing for correctness is to minimize the chance that mistakes and failures go unnoticed. The first method is the classic "fail early and fail often". I'm not saying to litter your code with pointless asserts. Rather I mean that if the logic for a section of code assumes something is true, but won't crash if it's false, make it crash if it's false. On the other hand, if it would already naturally reliably crash (like a null pointer dereference) don't add unnecessary code. This is actually a basic design principle underlying rust. It tries to crash the minute something is weird and not let it propagate forwards in your program. That's what unwrap (and what you replace it with) is all about. This has the additional advantage of keeping crashes local rather than letting them propagate, which makes debugging far far easier.

A less discussed method to make failures more obvious though is to minimize branching and special case code. Usually when bugs sneak through you're QA process (whether your QA process is an entire QA team, some unittests, or just running the software by hand and seeing if it works), it's something *unusual* that happened. Bugs are almost always in some bit of code that isn't usually exercised. Yes, you can try and write tests to test every corner case, but you can also design your code to minimize corner cases in the first place.

In Tasx I started with a simple data model. I get CalDav task structures from the kitchen-fridge crate, and I put them in a list (the list is dictated somewhat by GTK). When a mutation occurs I changed the tasks stored in the list. I also had a version sitting in kitchen-fridge's cache. So, when I mutate the one in the list I have to also mutate the one in cache. It's a little messy, but not too big of a deal.

But, then I needed another window to edit the tasks. So, I added one (The details window that pops up when you click a More button). The toolkits I'm using run the other window in it's own thread. I could put my list behind a mutex, but I'll get into why I didn't in a bit. Instead, I copied the entire Task structure into the other thread. Now edits can come from two places and go to 3 places. This is getting problematic. It'd be easy to have an edit somehow change one version but miss another, and I had several such bugs. That was no good. So, next I switched to sending the entire task around. This let me centralize changes. I could edit the local copy of the task then send it elsewhere and let it propogate. I still have propagation logic for each starting source, but it's still not too bad.

Then, it got worse again. To do nested tasks I needed to understand child/parent relationships while sorting the list. To avoid ludicrous data duplication I needed the data stored in a dictionary. I couldn't abide duplicating the data, so I pulled the tasks out of the list, moved them into the hashmap, and put just the necessary data for display in the list (and a UID so I know what needs changing). Now I'm back to bespoke editing for data in the list again! Around the same time I added support for the "Description" field, which could easily be a short essay, and it's updated with each keypress for UX reasons. So, I might be copying this data all over the place every time I send a task around. Even worse I was tracking each tasks "children", and editing the child list was a mess. What if two threads needed to change the child list at the same time? If I just overwrite the task one of the edits will gets stomped. It was time for another redesign.

So, I designed a "TaskChange" structure. This structure would encode just the fields that get changed. For the "children" property I could now have a "delta" concept that adds or removes a child. This makes mutation order way less important.

Now all my mutations run through 3 functions, add, delete, and change. Changes can come from anywhere in the program and get propagated by exactly the same code. Display of a mutation is not done locally (okay, there's one corner-case, but you get the idea), it always propagates the same way. This means I only have to get it right once, and what I see when I make a change is going to be the data everywhere. If I have a stupid bug like the code is missing to apply the change to the "Description" field I'll know it because my edit to the description field won't show up in the UI.

Reducing corner cases

I said I would come back to why I didn't use a central data store and mutex it. The first is simply that I want a responsive UI and mutexes are not a great way to get that. Relm4 and GTK have asynchronous APIs so by default you get a responsive UI as long as you don't screw it up. But, I could just make sure I never hold a mutex for very long right?

But there's another downside. Race conditions are the worst kind of corner cases. They are just the right combination of random and consistent such that they are completely impossible to fully test. Such race conditions are global properties of the code. Every time I mutate data I have to make sure to do it in the right order. I need to take this mutex before that one. If in one place it's the other order, I could deadlock. If in one place I mutate the data in the wrong order between/outside mutexes some other code could see an inconsistent state. It's a lot to think about every time you touch that structure. I probably would've ended up copying data out of the structure just to avoid races or holding a mutex too long. Sometimes it's worth all of that when you really need that common data store, but I really didn't. In my case the kitchen-fridge cache, the hashmap, the list, and the details window can all have different versions of the data for a short time, I don't care. As long as things eventually line up (in a few milliseconds) the user won't even notice. That's pretty easy to do with messages particularly since all mutations go through a central point, mutations can't "cross". Once again that centralized mutation design is helping us out.

Conclusion

When thinking about design people often talk about readability vs. performance. Those are important aspects, and a lot has been written about how readability contributes to correctness. In fact, many classic coding rules like avoiding duplication are intended to contribute to correctness. Test driven design is often discussed as another method to improve correctness. But, what I'm proposing is a more of a mindset, a wholistic view where you design to try and minimize the very corners bugs can hide in. In my case this is how I cheat having to write real tests, but in a big project this is how you reduce your test surface area and bring your test matrix down to something that is *possible*. If you then take the design into account when designing how to test you can write a heck of a lot fewer tests (and save on infrastructure costs as well).

I'm sure Tasx will have bugs found in it, but between rust, and these design choices, I've drastically reduced the types of bugs that could slip through even my very rudimentary QA process. If it's not good enough, I might have to actually go write a few tests.

Lastly, remember that Tasx is open source. If you want to make this more concrete you can go read the code, including the history and all of my stupid mistakes: https://codeberg.org/multilinear/tasx .

Computers Are Hard

Cluster Operations to Type Theory, Algorithms to Security