Rawr!!!

Slow ZFS zpool?

Written by
October 6th, 2010

IMAG0122

Is a slow zfs pool giving you the blue’s? Well it sure was for me, and it took me forever to figure out the problem. I scoured the internet for many solutions, many of which didn’t work, and it wasn’t until I used my brain did I figure out what was happening ;) Hopefully this post can reduce your frustrations, and help you resolve your slow zfs pool easily.
I checked so many things, and in the end it was something very simple. But lets start with all the things that I tried & didn’t work. These are common problems with zfs when you use cheap consumer hardware. I’m using Western Digital Green Drives, and this is the price that I have to pay.

Wrong Direction:

  • TLER – First I thought that Time-Limited Error Recovery (TLER) was dramatically slowing the drives down. I had heard of this happening & went off to google to find some solutions. The wikipedia page << explains that WDTLER.EXE can help, so I tried that on my warrior boot usb (I’ll post this later), and it didn’t work for me, it just hung. So I started up another thing on my boot usb (PartedMagic) & went the trusty ol’ Open Source route with smartmontools. You need to download from svn & compile manually to get the new features. Then run the following:
smartctl -l scterc /dev/sda # Query the drive for TLER status
smartctl -l scterc,70,70 /dev/sda # This will set TLER to 7 seconds (default)
# You get this if your drive can't do TLER (bummer)
Warning: device does not support SCT Error Recovery Control command

# This is what you will get if your drive has TLER enabled (hooray)
SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)
  • Then I thought “You have 100% blocking”:

# iostat -xnz
<pre>
<pre>                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   87.0    0.0 2878.1  0.0  0.0    0.0    0.4   0 100 c4t0d0
    0.0   83.0    0.0 2878.1  0.0  0.1    0.2    0.7   1  50 c4t1d0
    1.0    0.0   28.0    0.0  0.0  0.0    0.0    5.4   0   1 c4t2d0

See disk c4t0d0 is at 100% blocking :( Anywho here is a fix for that:

su - # enter password
echo zfs_vdev_max_pending/W0t1 | mdb -kw
echo "set zfs:zfs_vdev_max_pending=1" >> /etc/system

This is normally the case of ZFS using incorrect scheduling & sending your cheap sata disks too many tasks & overloading them. Thankfully I didn’t have blocking, but I made the patch anyways since I’m using WD Green drives.

  • “Your drives are idling too much? (clicking sound)”

Fire up smartctl again & run the following:


smartctl -a /dev/rdisk1 | grep ID ; smartctl -a /dev/rdisk1 | grep Start

# This is what I got on my macbook pro - 320GB caviar black drive (it's my secondary)

[jgerold@jgerold-13mbp.oc.cox.net ~]$ smartctl -a /dev/rdisk1 | grep ID ; smartctl -a /dev/rdisk1 | grep Start
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1770

Should think about maybe replacing this soon?

Not necessarily, upon looking at the PDF from Western Digital I have 600,000 spin up/down to look forward to on this disk. I’m not sure if RAW_VALUE equates to actual spin up/down & that smartctl is just being funky?

But if you have a lower value, then I wouldn’t worry; yet if your count is high & ever growing then I would recommend you boot into a windows instance (use bartpe of some sort), and grab a copy of WDIDLE.EXE (http://webdiary.com/i/?p=515) and turn off idleing wdiddle /d or something like that? I’m not sure I haven’t tried it.

Right Direction (This is what fixed it for me):

    I removed a horrible disk from my pool!
zpool offline ambry c0d4

I wish I had the readout for my `iostat -mX’ but my forth drive had a 3285ms timeout and was making the whole pool slow down to a crawl because it was failing (but not dead). I was noticing full disk usage with I would do an ls on a small dir, or copy data, or anything that required disk use. The crazy thing is that it took about 3 days for the array to recover once I removed the bad disk from the pool. Me being ever curious would run the following to monitor how fast the drives were:

while true; iostat -mX /dev/{ct01,ct02,ct03}; sleep 1; done

I joyfully watched my disks go from about .5 MB/s to now over 120MB/s (per disk)

Afterthoughts:

I wish I would have ran an iostat before trying all the things that I did :( But everything looks better in hindsight. I’m just glad that this is fixed and I was able to make a secondary backup, just in case anything else goes wrong before I get my new server up.

Sources:
- http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery
- http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix
- http://www.csc.liv.ac.uk/~greg/projects/erc
- http://webdiary.com/i/?p=515

How to create zfs stripe (pool) [NAS]

Written by
December 10th, 2008

I recently (Last night to be exact) I installed Open Solaris as my NAS/Backup Server. Why you may ask? Well I love Arch Linux yet see the *need* to start using ZFS. ZFS is freakin-fan-tastic. It’s puts the S in simple, and allows you to have a filesystem that does much more than a common file system. Such as NFS, SMB, and Compression (did I mention that this is all built in :) I’m going to go through the simple process to setup a zfs striped pool, and setup a few datasets, and apple compression.

OpenSolaris tools I’ll cover:
format
zfs(1M)
zpool(1M)

Lets start out by finding the disks that we would like to add to the pool:

root@Nom:~# format < /dev/null

Which will look like this:

root@Nom:~# format < /dev/null
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c3d0
          /pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
       1. c4d1
          /pci@0,0/pci-ide@1f,2/ide@0/cmdk@1,0
       2. c5d0
          /pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0
Specify disk (enter its number):

This shows the three disks that are present in my system. I’ll break it down a little. The number you see at the beginning is the number as per the format command. Then the information in the <>’s displays a disk ID, size of the disk, and some other little tidbits for the people that care.

What you’re looking for is “c4d1″ & “c5d0″ which are the two Western Digital 1TB disks that I’m going to make my pool with.

To create the pool use the zpool command:

root@Nom:~# zpool create nom c4d1 c5d0

That’s it, you’ve now created your first zfs pool. Just to sum up what I just did, I formatted the disks, set the mountpoints, mounted the device, and now have an active zfs pool.

If you would like to see the zpools that you currently have do the following command:

root@Nom:/nom# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
nom    1.81T  82.5K  1.81T     0%  ONLINE  -
rpool   149G  3.36G   146G     2%  ONLINE  -

I could go ahead and add a NFS share and Compression yet, why not stay organized :) I would rather create individual file systems to store the different data that I have.

To show what I mean I’ll show you what a zfs file system is:

root@Nom:/nom# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
nom                       70.5K  1.78T    18K  /nom
rpool                     4.36G   142G    72K  /rpool
rpool/ROOT                2.37G   142G    18K  legacy
rpool/ROOT/opensolaris    2.37G   142G  2.24G  /
rpool/dump                1019M   142G  1019M  -
rpool/export                59K   142G    19K  /export
rpool/export/home           40K   142G    19K  /export/home
rpool/export/home/fsk141    21K   142G    21K  /export/home/fsk141
rpool/swap                1019M   143G    16K  -

This command shows pools/file systems. If you look you can see my two pools (nom, rpool) and the filesystems underneath the pools (ROOT, dump, export, swap)

I would like to create subsets like the above (File Systems)

root@Nom:/nom# zfs create nom/backup

So now if I do a zfs list I get the following:

root@Nom:/nom# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
nom                       97.5K  1.78T    18K  /nom
nom/backup                  18K  1.78T    18K  /nom/backup
rpool                     4.36G   142G    72K  /rpool
rpool/ROOT                2.37G   142G    18K  legacy
rpool/ROOT/opensolaris    2.37G   142G  2.24G  /
rpool/dump                1019M   142G  1019M  -
rpool/export                59K   142G    19K  /export
rpool/export/home           40K   142G    19K  /export/home
rpool/export/home/fsk141    21K   142G    21K  /export/home/fsk141
rpool/swap

Since I’m going to be backing up to this file system I would like to turn on a couple little things… To get a listing of what zfs set can set then just type ‘zfs set’

root@Nom:/nom# zfs set compression=on nom/backup
root@Nom:/nom# zfs set sharenfs=rw nom/backup

I’ve just setup automagical compression, along with a read/write nfs share for ‘/nom/backup’ Now all I need to do is setup nfs on my client machine to connect to the nfs server.

Links:
http://www.sun.com/bigadmin/features/articles/zfs_overview.jsp
http://blogs.sun.com/timthomas/entry/creating_zfs_file_systems_from