Migrating ZFS from Linux to FreeBSD

Posted on 2020-02-11.
ZFS on Linux might get a lot of the latest features, and with a distribution like Arch Linux you have the bleeding edge, but it makes great sense to migrate everything ZFS related to FreeBSD. On FreeBSD ZFS is a first class citizen. This means that you don't have to worry about hostile kernel commits that suddenly breaks ZFS, or kernel modules that has to be re-compiled every time the kernel is updated. Being a first class citizen also means that the entire operating system is tailored to work really well with ZFS. The installer makes it really easy to get ZFS on root, with support for all the different possible configurations, and all the relevant tools know about ZFS, even the 'top' commando shows the memory usage of the ZFS ARC!

Table of contents

Introduction

The FreeBSD version of ZFS is a bit behind the ZFS version on Linux, and as such FreeBSD 12.1 doesn't have things like native ZFS encryption yet. But since FreeBSD is pulling stuff in from ZFS on Linux now, rather than Solaris, it is getting new ZFS features faster.

But the most important thing is that ZFS is treated as a first class citizen. You don't have to patch anything, and you don't have to worry about breakage, and all the tools know about ZFS.

However, migrating an exiting ZFS pool from Linux to FreeBSD isn't easy. If you already have a pool running on Linux with "Linux-only" features, or newer features enabled in the pool, you need to backup the data, export the pool, and then create a new pool on FreeBSD. But unless you really need those specific features, migrating is worth all the work!

If you already are running ZFS on Linux, then you already know all the good stuff about ZFS, but then you also know about the less welcoming environment ZFS currently lives in on Linux. Having the operating system designed to work with ZFS from the ground up makes a huge difference.

You can read more about ZFS on FreeBSD in the FreeBSD manual.

In this minor tutorial I'll speak a little about some issues and address a couple of tools.

Boot environments

With FreeBSD on ZFS you get boot environments. A boot environment is a bootable instance of the operating system plus any installed third party packages. It is based upon a bootable clone of a ZFS dataset.

With FreeBSD you can manage multiple boot environments and each boot environment can have different versions of the operating system and/or packages. This means that the boot environment also allow the system to be upgraded, while preserving the old system environment in a separate ZFS dataset. Should the upgrade go wrong for some reason, you can just boot of the previous boot environment.

Each boot environment consists of a root dataset and, optionally, other datasets nested under that root dataset.

When you install FreeBSD on ZFS a default boot environment is created. You can then use the bectl utility to manage boot environments.

You can even create a new boot environment based upon a snapshot of the current running environment, then mount the newly created environment from the current running system and update the system inside the new environment without touching the running system. This is very useful if you have to manage a remote system where you only have access to the machine via the console. You can make the new environment active, then have the machine boot into it, but should the boot procedure fail due to some problem with the upgrade, the machine can be set to automatically boot into the previous boot environment.

You can also copy and move a ZFS boot environment into another machine and run it there, or use a FreeBSD Jail to test the results in.

This means that you can not only do major reconfiguration of running third party applications such as mail servers, web servers, etc., but you can also mass populate large amounts of servers with one configured boot environment, and at the same time you can use it as a bare metal backup solution.

Selections of boot environments has been integrated into the FreeBSD loader which means you can always change the boot environment at boot.

With bectl we can list all the boot environments. At the moment I only have the default one:

# bectl list
BE      Active Mountpoint Space Created
default NR     /          5.40G 2020-02-02 02:37

Under the Active column the letter N points to the active boot environment, while the letter R is the boot environment that will be booted from on the next boot.

Let's create a new boot environment, mount it, and install some packages in that:

# bectl create -r testing-packages
# bectl list
BE               Active Mountpoint Space Created
default          NR     /          5.40G 2020-02-02 02:37
testing-packages -      -          8K    2020-02-12 09:26

The -r option is the recursive option, it is needed to make sure we get all the relevant datasets.

Let's mount it:

# bectl mount testing-packages
successfully mounted testing-packages at /tmp/be_mount.BCNN

# ls /tmp/be_mount.BCNN/
.cshrc          bootpool        etc             media           rescue          tmp
.profile        COPYRIGHT       home            mnt             root            usr
bin             dev             lib             net             sbin            var
boot            entropy         libexec         proc            sys             zroot

Let's install a packages in it:

# pkg -r /tmp/be_mount.BCNN/ install tuxpaint

We can then activate the testing-packages boot environment for the next boot where we will have tuxpaint installed. If tuxpaint for some reason should mess up our FreeBSD system we can revert back to the default environment (it is Tux after all).

Let's unmount it and set the new environment as the active one:

# bectl umount testing-packages

# bectl activate testing-packages
successfully activated boot environment testing-packages

# bectl list
BE               Active Mountpoint Space Created
default          N      /          9.87M 2020-02-02 02:37
testing-packages R      -          5.46G 2020-02-12 09:26

The letter R now shows that at the next boot we will boot into the testing-packages boot environment.

Let's imagine that Tux did mess up our system, we can then reboot and use the default boot environment from the boot loader, we can then activate the default environment and destroy the testing-packages environment if we don't want to investigate further:

# bectl activate default
successfully activated boot environment default

# bectl list
BE               Active Mountpoint Space Created
default          NR     /          5.40G 2020-02-02 02:37
testing-packages -      -          59.3M 2020-02-12 09:26

# bectl destroy testing-packages
bectl destroy: leaving origin 'zroot/ROOT/default@2020-02-12-09:26:02-0' intact

# bectl list
BE      Active Mountpoint Space Created
default NR     /          5.40G 2020-02-02 02:37

A more relevant use case is to create a boot environment before an upgrade, then do the upgrade in the default environment, and if something goes wrong, like a driver not working as expected any longer, you can revert the whole system back.

FreeBSD is setup so that your home directory and the directories /var/log/, /var/crash/, /var/audit/ and /var/mail/ doesn't get affected by the different boot environment. That way you won't find all your logs files reverted, or the files in your home directory reverted. It's only the operating system and third party packages that get reverted then.

Missing /dev/disk/by-id/

Creating pools using the by-id label on GNU/Linux provides a huge advantage. Not only do you eliminate the possibility of disks switching device names if you need to change a disk and happens to reboot before the old disk has been replaced by a new disk, but you automatically get the serial number of the disk into the label.

$ ls -gG /dev/disk/by-id/
ata-ST31000340NS_9QJ089LF -> ../../sdd
ata-ST31000340NS_9QJ0EQ1V -> ../../sdb
ata-ST31000340NS_9QJ0F2YQ -> ../../sdc
...

This makes it easier to identify a broken disk. All you need to do is to map the serial number to the number of the slot that the disk is attached to, or you can also put the serial number on a sticker and then attach that to the front of the disk.

On FreeBSD there is no /dev/disk/by-id/ directory, but there is something almost identical called disk_ident which is located in /dev/diskid/ - when setup correctly.

If you happens to run FreeBSD with ZFS on root then (depending on your setup) the installer might have disabled diskid and used normal device names like ada1 and ada2 instead. You might also manually have used such device names when you created your pool.

So you might see something like this:

# zpool status
  pool: mypool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mypool       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0

errors: No known data errors

It's not a major problem because even if a device name gets switched and ada3 becomes ada2 etc., as soon as you replace the broken device with a new one, ZFS will figure things out. ZFS is really good at keeping track of the disks and it has its own internal identification system.

However, ideally we would like to see something like this:

# zpool status
  pool: mypool
 state: ONLINE
  scan: none requested
config:

        NAME                             STATE     READ WRITE CKSUM
        mypool                            ONLINE       0     0     0
          raidz1-0                       ONLINE       0     0     0
            diskid/DISK-WD-WCC7K3NRUYL1  ONLINE       0     0     0
            diskid/DISK-WD-WCC7K6CAX8AY  ONLINE       0     0     0
            diskid/DISK-Z30133A1         ONLINE       0     0     0
            diskid/DISK-W300GYTS         ONLINE       0     0     0

Some people don't like to use diskid because occasionally the serial number gets encoded if it contains spaces and it can look really ugly then. However, I haven't personally run into that problem, but even then I still prefer to use diskid because the serial number is still readable and I don't have to manually provide the disk with a label where I might make a typo without noticing.

People use different approaches and recommend different things and the book FreeBSD Mastery: Advanced ZFS, by Allan Jude and Michael W. Lucas, provides valuable information.

Anyway, in order to get diskid working you need to make sure it isn't disabled:

# sysctl kern.geom.label.disk_ident.enable
kern.geom.label.disk_ident.enable: 0

In this case it is disabled. Put the following into /boot/loader.conf:

kern.geom.label.disk_ident.enable="1"

Then enable glabel:

geom_label_load="1"

Then reboot.

You will now be able to see disks in /dev/diskid/. However, you cannot see a disk in "diskid" if it has already been mounted using another GEOM. So if the disk is already mounted using "adaX" then it won't show up in "diskid". This is called "GEOM withering".

If you already have a running ZFS pool created with "adaX" labels you can export the pool, reboot the machine, then have ZFS import the pool using the "diskid" labels using the '-d' option:

# zpool export mypool
# reboot

Then after the reboot:

# zpool import -d /dev/diskid/ mypool

Now you have a directory called /dev/diskid/ and it has labels with serial numbers in it:

# ls /dev/diskid/
DISK-W300GYTS
DISK-W300GYTSp1
DISK-W300GYTSp9
DISK-WD-WCC7K3NRUYL1
DISK-WD-WCC7K3NRUYL1p1
DISK-WD-WCC7K3NRUYL1p9
DISK-WD-WCC7K6CAX8AY
DISK-WD-WCC7K6CAX8AYp1
DISK-WD-WCC7K6CAX8AYp9
DISK-Z30133A1
DISK-Z30133A1p1
DISK-Z30133A1p9

Unfortunately it doesn't show how these IDs are mapped to adaX like it does with by-id on Linux, but we can get that information manually if we need it.

If we're working with an exported pool we can list the devices we can import:

# zpool import
   pool: mypool
     id: 1918994596645956952
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        mypool       ONLINE
          raidz1-0  ONLINE
            ada1    ONLINE
            ada2    ONLINE
            ada3    ONLINE
            ada4    ONLINE

Then we can map each to the serial number:

# geom disk list ada1|grep ident
   ident: WD-WCC7K3NRUYL1

So ada1 is WD-WCC7K3NRUYL1.

However, we don't even need to do that. As mentioned above, ZFS keeps track of disks using its own system. So we can just ask ZFS to only look for disks in a specific path using the -d option and force an import using these identifiers instead:

# zpool import -d /dev/diskid/ mypool
# zpool status
  pool: mypool
 state: ONLINE
  scan: scrub repaired 0 in 0 days 02:01:47 with 0 errors on Thu Jan 30 23:49:42 2020
config:

        NAME                             STATE     READ WRITE CKSUM
        mypool                            ONLINE       0     0     0
          raidz1-0                       ONLINE       0     0     0
            diskid/DISK-WD-WCC7K3NRUYL1  ONLINE       0     0     0
            diskid/DISK-WD-WCC7K6CAX8AY  ONLINE       0     0     0
            diskid/DISK-Z30133A1         ONLINE       0     0     0
            diskid/DISK-W300GYTS         ONLINE       0     0     0

errors: No known data errors

This way we have changed the pool from using the adaX device names to using serial numbers instead.

Useful tools

Some of these tools are not unique to FreeBSD, but I'll address them anyway.

top

With top you can get useful information about the memory consumption of the ZFS ARC:

$ top
last pid:   933;  load averages:  0.21,  0.07,  0.02                                                                                                              up 0+00:01:34  04:13:40
28 processes:  1 running, 27 sleeping
CPU:  0.2% user,  0.0% nice,  0.5% system,  0.1% interrupt, 99.2% idle
Mem: 149M Active, 38M Inact, 332M Wired, 7304M Free
ARC: 115M Total, 49M MFU, 64M MRU, 64K Anon, 474K Header, 1965K Other
     30M Compressed, 84M Uncompressed, 2.78:1 Ratio
Swap: 2048M Total, 2048M Free

If you press C you'll get the raw CPU mode instead of the weighted CPU mode. If you like to see all the cores you can press P, which is like pressing 1 on Linux.

last pid:   950;  load averages:  0.16,  0.11,  0.04                                                                                                              up 0+00:05:48  04:17:54
28 processes:  1 running, 27 sleeping
CPU 0:  0.1% user,  0.0% nice,  0.1% system,  0.1% interrupt, 99.7% idle
CPU 1:  0.1% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.8% idle
CPU 2:  0.1% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.7% idle
CPU 3:  0.0% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.8% idle
Mem: 149M Active, 39M Inact, 335M Wired, 7299M Free
ARC: 118M Total, 49M MFU, 67M MRU, 64K Anon, 485K Header, 2002K Other
     31M Compressed, 86M Uncompressed, 2.78:1 Ratio
Swap: 2048M Total, 2048M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME     CPU COMMAND
  754 root          1  30    0   172M   149M select   3   0:00   0.00% smbd
  722 ntpd          2  20    0    18M    18M select   2   0:00   0.00% ntpd
  771 root          1  39    0    14M  5380K nanslp   2   0:00   0.00% smartd
  924 root          1  21    0    20M  9892K select   2   0:00   0.00% sshd
  725 root          1  20    0    11M  2296K select   0   0:00   0.00% powerd

You can also switch to IO display by pressing m in which case you can monitor how much each process is reading and writing to disk.

last pid:   953;  load averages:  0.23,  0.14,  0.05                                                                                                              up 0+00:07:02  04:19:08
28 processes:  1 running, 27 sleeping
CPU 0:  0.1% user,  0.0% nice,  0.1% system,  0.1% interrupt, 99.8% idle
CPU 1:  0.1% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.8% idle
CPU 2:  0.1% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.8% idle
CPU 3:  0.0% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.8% idle
Mem: 149M Active, 39M Inact, 335M Wired, 7298M Free
ARC: 118M Total, 49M MFU, 67M MRU, 64K Anon, 485K Header, 2002K Other
     31M Compressed, 86M Uncompressed, 2.78:1 Ratio
Swap: 2048M Total, 2048M Free

  PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
  754 root          34      1     36     26     63    125  19.69% smbd
  722 ntpd         532     10     26      1     40     67  10.55% ntpd
  771 root           2      1      0      1      0      1   0.16% smartd
  924 root          26      3     14      3      3     20   3.15% sshd
  725 root        1601      3      0      1      0      1   0.16% powerd

camcontrol

You can use camcontrol to get information about which disk is located as what device and bus:

# camcontrol devlist
<ST9120821AS 7.24>                 at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD40EFRX-68N32N0 82.00A82>    at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD40EFRX-68N32N0 82.00A82>    at scbus2 target 0 lun 0 (ada2,pass2)
<ST4000DX001-1CE168 CC44>          at scbus3 target 0 lun 0 (ada3,pass3)
<ST4000DM000-1F2168 CC52>          at scbus4 target 0 lun 0 (ada4,pass4)

You can also get a lot of other useful information:

# camcontrol identify /dev/ada1
pass1: <WDC WD40EFRX-68N32N0 82.00A82> ACS-3 ATA SATA 3.x device
pass1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)

protocol              ACS-3 ATA SATA 3.x
device model          WDC WD40EFRX-68N32N0
firmware revision     82.00A82
serial number         WD-WCC7K3NRUYL1
WWN                   50014ee2650a51a8
additional product id 
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 4096, offset 0
LBA supported         268435455 sectors
LBA48 supported       7814037168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6 
media RPM             5400
...

GEOM

GEOM is a modular disk transformation framework. It permits access and control to classes, such as Master Boot Records and BSD labels, through the use of providers, or the disk devices in /dev. By supporting various software RAID configurations, GEOM transparently provides access to the operating system and operating system utilities.

You can also use geom to list the serial number and other useful information. The serial number is listed as ident:

# geom disk list
Geom name: ada1
Providers:
1. Name: ada1
   Mediasize: 4000787030016 (3.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
   descr: WDC WD40EFRX-68N32N0
   lunid: 50014ee2650a51a8
   ident: WD-WCC7K3NRUYL1
   rotationrate: 5400
   fwsectors: 63
   fwheads: 16

You can read more about GEOM in the FreeBSD manual.

diskinfo

You can also use diskinfo to get relevant information about a disk:

# diskinfo -v ada1
ada1
        512             # sectorsize
        4000787030016   # mediasize in bytes (3.6T)
        7814037168      # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        7752021         # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        WDC WD40EFRX-68N32N0    # Disk descr.
        WD-WCC7K3NRUYL1 # Disk ident.
        No              # TRIM/UNMAP support
        5400            # Rotation rate in RPM
        Not_Zoned       # Zone Mode

gpart

On FreeBSD fdisk has been deprecated and replaced with gpart. If you need to see the partition table of a specific disk, you can use gpart:

# gpart show ada0
=>       40  234441568  ada0  GPT  (112G)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048    4194304     2  freebsd-swap  (2.0G)
    4196352  230244352     3  freebsd-zfs  (110G)
  234440704        904        - free -  (452K)

gstat

You most likely already know about iostat, which is a well know tool that reports I/O statistics.

Another great tool is gstat that combined with the -p option provides GEOM I/O statistics in one second intervals. Right now my disks are not doing anything:

# gstat -p
dT: 1.011s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0| ada0
    0      0      0      0    0.0      0      0    0.0    0.0| ada1
    0      0      0      0    0.0      0      0    0.0    0.0| ada2
    0      0      0      0    0.0      0      0    0.0    0.0| ada3
    0      0      0      0    0.0      0      0    0.0    0.0| ada4

zdb

Another really useful tool is the zdb utility.

It has a ton of options, but it is important to note that zdb is a not a general purpose tool and options may change. The output of zdb reflects the on-disk structure of a ZFS pool, and is inherently unstable. The precise output of most invocations is not documented, a knowledge of ZFS internals is assumed.

We can get a lot of interesting information:

# zdb -C mypool
MOS Configuration:
        version: 5000
        name: 'mypool'
        state: 0
        txg: 32087
        pool_guid: 1918994596645956952
        hostid: 3865062308
        hostname: 'foo'
        com.delphix:has_per_vdev_zaps
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 1918994596645956952
            create_txg: 4
            children[0]:
                type: 'raidz'
                id: 0
                guid: 1652892640172304252
                nparity: 1
                metaslab_array: 70
                metaslab_shift: 37
                ashift: 12
                asize: 16003128885248
                is_log: 0
                create_txg: 4
                com.delphix:vdev_zap_top: 65
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 14433036235128068445
                    path: '/dev/diskid/DISK-WD-WCC7K3NRUYL1'
                    whole_disk: 1
                    DTL: 170
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 66
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 15124204927999292914
                    path: '/dev/diskid/DISK-WD-WCC7K6CAX8AY'
                    whole_disk: 1
                    DTL: 169
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 67
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 13432944061349488304
                    path: '/dev/diskid/DISK-Z30133A1'
                    whole_disk: 1
                    DTL: 168
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 68
                children[3]:
                    type: 'disk'
                    id: 3
                    guid: 4262499248131338058
                    path: '/dev/diskid/DISK-W300GYTS'
                    whole_disk: 1
                    DTL: 167
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 69
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data

We can also get a history of what's been done to the pool:

# zdb -h mypool

History:
2020-01-26.03:56:49 zpool create mypool raidz1 /dev/diskid/DISK-WD-WCC7K3NRUYL1 /dev/diskid/DISK-WD-WCC7K6CAX8AY /dev/diskid/DISK-Z30133A1 /dev/diskid/DISK-W300GYTS
2020-01-26.03:58:04 zfs create -o compress=lz4 mypool/pub
2020-01-30.21:48:04 zpool scrub mypool

We can also display some basic dataset information about our pool:

# zdb -d mypool
Dataset mos [META], ID 0, cr_txg 4, 7.87M, 167 objects
Dataset mypool/pub [ZPL], ID 85, cr_txg 19, 2.47T, 4339 objects
Dataset mypool [ZPL], ID 51, cr_txg 1, 128K, 8 objects
Verified large_blocks feature refcount of 0 is correct
Verified large_dnode feature refcount of 0 is correct
Verified sha512 feature refcount of 0 is correct
Verified skein feature refcount of 0 is correct
Verified device_removal feature refcount of 0 is correct
Verified indirect_refcount feature refcount of 0 is correct

Final notes

This minor write up hasn't come anywhere near all the great stuff that FreeBSD has to offer when you run ZFS on FreeBSD, but I hope that it at least has presented the option of running ZFS on FreeBSD as not only a viable alternative to Linux, but also as an advantage.

Running ZFS on FreeBSD is not only less of a hassle, especially if you want to run ZFS on root, but it also provides better tooling, integration and easier administration. It also has good documentation for many of the tuneable options and an experienced and friendly community.