The accidental journey to TrueNAS Scale

Last month I found myself in the unenviable position of having just executed the most dangerous command (dd) on the boot drive of the NAS of my homelab. I was trying to wipe stale ZFS partition information from a data drive after it became briefly detached, but on reboot, the drive labels had been reassigned and I didn’t double check if I was wiping the correct drive.

I caught my mistake after 30 seconds, but the damage had been done. The boot drive became read-only. I didn’t know if the NAS would startup if rebooted so I immediately went into disaster recovery mode.

At this point, the NAS is 7 years old, and showed scars of its past role as a kitchen sink server. It started with Ubuntu 16.04 and soon gained dozens of apt packages, nginx services, docker containers, databases, home grown scripts, and even compilers installed. This setup was my muse as I became accustomed to system administration and I journaled my learnings:

While my first thought at a read-only file system was years of tinkering down the drain, I could take this opportunity for a fresh start.

TrueNAS Scale: A fresh start?

My mouse cursor hovered over the download button for the recently released Ubuntu 24.04, but I stopped myself when I figured this fresh start could focus on simplicity. With an Ubuntu DIY ZFS NAS, one is left to their own devices, and as I’ve gotten older, I’ve grown to appreciate gentle guardrails. System administration is only occasionally stimulating and I’d rather spend time on other priorities.

TrueNAS Scale was the natural fit for me to migrate to as it’s also based on linux + ZFS so it wouldn’t be too dissimilar. In my quest for simplicity, I was delighted to learn that transferring ZFS pools is a seamless export and import process.

zpool export tank

From time to time, I’d idly wonder what the next iteration of my NAS would look like, but accounting for migrating data kept the intrusive thought at bay. Now that I’ve been exposed to zpool export and how well it works, it’s now only a matter of time before the hardware begins to look long in the tooth. Hopefully, this fresh system tricks me into holding off from upgrading for at least a year.

We’re getting ahead of ourselves. Before actually migrating, I took extensive notes and replicated a minimal file system on the pool that contained:

  • Custom cron jobs
  • Custom systemd services and timers
  • Locally installed binaries and scripts
  • Select /etc configurations

Once everything was set, I powered off and connected to the ASRock java IPMI applet with icedtea-web 1.8.4 and OpenJDK 1.8.0_265 (I’m making a note of the tools and versions for future reference as I’ve spent too much time trying to connect previously and facing certificate headwinds and version discrepancies).

I primed IPMI with the TrueNAS iso, but hit a roadblock on boot. The BIOS became stuck on “dxe pci bus enumeration”. The internet suggested resetting the CMOS battery, but I was reluctant to pull the NAS out of the closet and disassemble it to reach the battery. Mini-itx systems are dense and lack of access is a real tradeoff.

After moving the low hanging fruit components out of the way, the CMOS reset was still nerve racking as I balanced instruments in my hand like a surgeon to grab the battery. Reassembling and powering on the NAS, I saw the BIOS boot up, and my sigh of relief was heard around the world.

Installation

On first installation, TrueNAS had difficulties importing the pool. The root cause eludes me, but two candidates:

  • The pool had datasets with sharesmb=on, but TrueNAS wasn’t configured with Samba yet. Interestingly, even today with the datasets shared over SMB, TrueNAS doesn’t use this setting.
  • I used the top level directory of the pool as a dumping ground, which goes against a best practice guide on structuring a pool.

Either way, I worked around the GUI limitation by dropping into the shell and importing the pool manually. TrueNAS was smart enough to configure the pool so that the disks are referenced by their disk id and not their label, which is how I got into this mess when I accidentally wiped the boot drive thinking it was a data drive.

Once imported, I tried to re-enable the SMB share at the pool’s root (like it was originally), but TrueNAS didn’t allow this. I believe TrueNAS didn’t want to share the root, and I didn’t argue with it as the pool’s role subtly changed in the migration. It went from a pure data pool to a system pool, where TrueNAS stores metrics, logs, and system user data. Ideally, writes related to this unimportant data would be on an SSD pool (see instructions on configuring this), but the SSD installed in my NAS has been entirely co-opted by TrueNAS for a boot partition. This is by design. A workaround exists if one is ready to bend over backwards. I’m not, so I resigned the SSD to be a pure boot drive.

Alternatively, I’d be fine with shipping metrics and logs off system instead of writing locally, but this doesn’t seem possible at the moment.

Adhering to how TrueNAS wanted the dataset hierarchy meant that I now needed to move all the files in the root dataset to new sub-datasets. This isn’t a bad thing. The TrueNAS dataset GUI entices me to better utilize datasets. I went from 5 haphazardly organized datasets to 20 (not to imply that more datasets are somehow inherently better, but I had been lazy). With the new datasets, I needed to move files across datasets (and thus filesystems) so it took all day for the command to finish, but I’m satisfied with the results.

A requirement of moving everything out of the root dataset meant I needed to delete all snapshots. Snapshots are dataset specific, so moving the file across datasets meant there would be two copies of data, and there wasn’t enough storage space to hold two copies of the data until the old snapshots could be aged out. I can imagine a less risky migration is possible that involves bringing old snapshot data along, but it would have greatly added to the complexity.

Applications

With the data all set, now comes applications. Most applications live on compute oriented nodes, but my NAS is the only server that has access to a GPU (an iGPU at that), so applications that leverage the GPU must be installed on the NAS. I came with a docker compose file and existing data, but TrueNAS did not welcome this with open arms, as it only runs kubernetes.

Kubernetes is not appropriate for the situation, and it also feels inappropriate to spin up a VM just for docker considering the minimal system resources at hand (only 8 GB of memory). Thankfully, a TrueNAS community member stepped up and created jailmaker, which uses systemd-nspawn under the covers so we can efficiently host a docker installation.

Jailmaker works great! There was an unrelated hiccup when restoring application data. Turns out wiping my drive had partially wiped the application’s sqlite database, so I had the pleasure of walking through SQLite’s recovering data from a corrupted database. It was able to recover just about everything. My mind is still blown to this day.

Despite jailmaker working so well, I will be avoiding installing any apps on TrueNAS unless necessary. The switch to TrueNAS is an endeavor in simplification and introducing a different layer of abstraction to keep how I run applications consistent, runs contrary to this. If my homelab was composed of only a single machine, I would have opted to expand the RAM and run a VM instead of running all of them via jailmaker.

Data Protection

Overall, TrueNAS data protection is good, but it can be a mixed bag depending on where you’re coming from.

For instance, in TrueNAS I setup 3 periodic ZFS snapshots:

  • An hourly snapshot that is kept for 23 hours
  • A daily snapshot that is kept for 30 days
  • A monthly snapshot that is kept for a year.

On Ubuntu, I installed zfs-auto-snapshot and never had to think about setting up different intervals. TrueNAS is more work in this regard.

I like to backup my cloud accounts locally and it was a cinch to set up a cloud sync task to pull from google drive. However, I also have a onedrive account, and TrueNAS recently removed support for this. The workaround is to create a cron job that manually invokes rclone, which is fine as it was originally a cron job on Ubuntu, but the inconsistency is a bit grating.

TrueNAS cloud sync tasks use rclone under the covers, so the lack of onedrive support is a bit mystifying. Additional rclone jobs had to remain as cron jobs too, as they used CLI arguments not found in the GUI like --no-gzip-encoding and --backup-dir. I look forward to TrueNAS expanding this feature in the future.

My NAS also pushes local data to backblaze b2 as a backup. Previously, I used restic, but TrueNAS showed me that rclone also supports encryption, which is the main thing I was using restic for. I spun up a new s3 bucket for the rclone backup and once that verified, I deleted the restic bucket.

Last but not least, it would be nice to be able to tell TrueNAS to schedule ZFS scrubs and long SMART tests on alternating Sundays, but in the meantime, I’ve settled on picking out specific days of the month to run them.

Conclusion

Is TrueNAS perfect and the migration seamless? No, but after a month running the system, I have no regrets. I adapted what I had to what TrueNAS expected so that TrueNAS will save me from my future self.

Comments

If you'd like to leave a comment, please email [email protected]