Upgrading my ZFS Box

This was me trying to debug my file server:
lightbulb

Over the last few years I’ve made a strong effort to decommission as many of my machines and virtual machines as possible in favor of hosted solutions. Ultimately while it can be fun to control the entire stack, it just ends up not being worth it when things go wrong and it’s you who is on the hook to fix it. I still maintain a file server that runs ZFS on FreeNAS though, since I still have use for a large pool of redundant local storage for backups and a large media library.

What started with some crashes while initiating a Time Machine backup ended up leading me down a fairly expensive rabbit hole. This particular kernel panic was easily reproducible by attempting to delete a folder of large files (Time Machine backup shards). Since the ZFS scrub was clean, I assumed hardware.

First I replaced the thumb drive the OS was installed on because it was a cheap thing from China and probably at the end of its life anyway. No luck. FreeNAS has really upped the suggested RAM requirements since I first built this box, so much so that most people say don’t even bother unless you have at least 8GB. I had 4, and despite memtest86 checking out this seemed like a reasonable time to upgrade. Except for that my current motherboard only supported 4GB of non-ECC memory. Upgrading to 8GB of ECC memory required buying a new board.

So at this point I’d committed to replacing nearly every component in this machine (not including the HDDs), except for the one part that I actually hated – the case. Hard drive failures are not infrequent, and having to disassemble the case every time I needed to swap one was super annoying. So screw it, let’s replace the case with a nice hot-swappable one.

Replacing the case sounded like a great idea until it arrived and I realized that my current power supply was a different size and was too big to fit in this case. Ok, fine. New power supply too. Good news is that there was literally nothing left for me to replace now.

So I built an entirely new computer, installed the 5 hard drives, re-imported my pool and…oh my gosh are you kidding me this thing is still crashing? What followed was a lot of angst and learning a bit more about freeBSD than I really cared to. I tracked the problem down to metadata corruption on a single 10MB file contained in a single shard in a single Time Machine backup. A stronger man than I probably could have done some surgery to repair this metadata but this is tedious and apparently my tool of choice is a sledgehammer. I ended up backing up the contents of my pool, blowing it away, and rebuilding it from scratch. Problem solved, in the most flailing and roundabout way possible.

But I learned things! The first was validation of swearing off basic RAID in favor of ZFS when I built this thing years ago. It was so nice to import my pool on entirely new hardware without any problems. With RAID, you are at the mercy of vendor specific implementations so unless you can find a compatible board you are out of luck. In fact, there was a period where the board I was running RAID5 on was no longer being made, so if that board died so did I.

ECC memory is a non-negotiable: I cheaped out the first time around, and this is probably what cost me. RAM is prone to bit rot which your file server is happy to redundantly preserve for all eternity. ZFS is saying to memory, “OK buddy, I trust you!” which turns out to not be great if your memory isn’t reliable. If you really want to build a file server to secure your most precious data, it doesn’t make any sense to introduce a weakest link. If you don’t believe me, listen to this guy.

Upgrading everything wasn’t actually a waste afterall. My reasons for swapping everything were misguided, but it was something that needed to be done anyway. As mentioned, I wasn’t using ECC RAM, nor was I using enough of it. If I didn’t fix this, it probably would have happened again. The case was a bit cosmetic, but it is something I was considering doing even before I was knee-deep in BSD crash dumps.

And since I was rebuilding my pool from scratch, I took the opportunity to do it the right way this time. Instead of just having one giant volume, I now have a shared pool, plus shares with their own quotas for each computer in the house that backs up to the server so that their backups won’t grow unbounded, as Time Machine likes do to:

zfs

Here is my final build:

Motherboard: SuperMicro MiniITX Server Motherboard
RAM: Kingston 8GB DDR3 ECC Server Memory
Case: SilverStone DS380B NAS chassis 8-bay
Power Supply: SilverStone SFX 300W

There. I fixed it.

One Response to “Upgrading my ZFS Box”

  1. Tammy says:

    Even when I don’t understand the details of this, I can chuckle through and enjoy your learning experience! I too have a sledgehammer as a favorite tool at times!

%d bloggers like this: