{"id":109,"date":"2014-06-27T10:03:58","date_gmt":"2014-06-27T14:03:58","guid":{"rendered":"http:\/\/www.hentschels.com\/blog\/?page_id=109"},"modified":"2014-06-27T10:12:16","modified_gmt":"2014-06-27T14:12:16","slug":"zfs-zvols-for-vm-usage","status":"publish","type":"post","link":"https:\/\/www.hentschels.com\/blog\/?p=109","title":{"rendered":"ZFS zvols for VM usage"},"content":{"rendered":"<p>These experiments were performed to determine the best setup for creating a handful of VMs on Xen, utilizing a minimum of disk space, with the realization that much of the filesystem across the multiple VMs will be identical. I may, however, want the size of each VM&#8217;s filesystem to be different, depending on the purpose.<\/p>\n<h1>Setup<\/h1>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol experiments 1\"># dd if=\/dev\/zero of=test_pool bs=1G count=10\r\n# for i in 1 2 3 4 5 6; do dd if=\/dev\/urandom of=tiny_file_$i bs=1M count=200; done\r\n# for i in 1 2 3 4; do dd if=\/dev\/urandom of=small_file_$i bs=1G count=1; done\r\n# for i in 1 2; do dd if=\/dev\/urandom of=big_file_$i bs=1G count=2; done\r\n# zpool create test test_pool\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          9.8G     0  9.8G   0% \/test\r\n\r\n# zfs list -r test\r\nNAME   USED  AVAIL  REFER  MOUNTPOINT\r\ntest   106K  9.78G    30K  \/test<\/pre>\n<p>So, I have a ZFS pool with about 9.8G usable space in it.<\/p>\n<p><strong>A note on usage of `sync`<\/strong>: I tend to execute the `sync` command after each copy operation because I have found that taking a snapshot of a zvol that has cached writes results in an incomplete or corrupt snapshot. Yet another thing to consider when using ZFS with zvols.<\/p>\n<h1>Cloning<\/h1>\n<pre class=\"lang:sh decode:true \" title=\"ZFS zvol experiments 2\"># zfs create -V 3G test\/disk1\r\n# mkfs.ext4 \/dev\/zvol\/test\/disk1\r\n# mkdir \/test\/disk1\r\n# mount \/dev\/zvol\/test\/disk1 \/test\/disk1\r\n# cp small_file_1 \/test\/disk1\r\n# sync\r\n# zfs snap test\/disk1@s1\r\n# cp small_file_2 \/test\/disk1\r\n# sync<\/pre>\n<p>At this point, here&#8217;s what my pool looks like:<\/p>\n<pre class=\"lang:sh decode:true \" title=\"ZFS zvol experiments 3\"># df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          4.6G     0  4.6G   0% \/test\r\n\/dev\/zd96                     2.9G  2.1G  738M  74% \/test\/disk1\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        5.23G  4.55G    31K  \/test\r\ntest\/disk1  5.23G  7.65G  2.13G  -<\/pre>\n<p>Already<span style=\"line-height: 1.5;\">, size-related numbers are starting to get a bit hazy. According to df, my pool is approximately 7.5G in size (4.6 + 2.9). According to zfs list, it still looks like about 9.8 total (5.23 + 4.55). Now, if I clone the first snapshot:<\/span><\/p>\n<pre class=\"lang:sh decode:true \" title=\"ZFS zvol experiments 4\"># zfs clone test\/disk1@s1 test\/disk2\r\n# mkdir \/test\/disk2\r\n# mount \/dev\/zvol\/test\/disk2 \/test\/disk2\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          4.6G     0  4.6G   0% \/test\r\n\/dev\/zd96                     2.9G  2.1G  738M  74% \/test\/disk1\r\n\/dev\/zd112                    2.9G  1.1G  1.8G  37% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        5.23G  4.55G    32K  \/test\r\ntest\/disk1  5.23G  7.65G  2.13G  -\r\ntest\/disk2   146K  4.55G  1.12G  -<\/pre>\n<p><span style=\"line-height: 1.5;\">The numbers get even\u00a0murkier. The used and available for the two disks look correct in df, and the available in the pool hasn&#8217;t changed, as I would have expected, but it&#8217;s getting hard to figure out exactly how much space is used and available where. Total space across all three items in df: 10.4G. So now let&#8217;s copy a new file into the clone:<\/span><\/p>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol experiments 5\"># cp small_file_3 \/test\/disk2\r\n# sync\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          3.6G     0  3.6G   0% \/test\r\n\/dev\/zd96                     2.9G  2.1G  738M  74% \/test\/disk1\r\n\/dev\/zd112                    2.9G  2.1G  738M  74% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        6.24G  3.54G    32K  \/test\r\ntest\/disk1  5.23G  6.64G  2.13G  -\r\ntest\/disk2  1.01G  3.54G  2.13G  -<\/pre>\n<p><span style=\"line-height: 1.5;\">Okay. I think I see where these values\u00a0are going. The available in the pool has decreased by 1G, and the used in disk2 has increased by 1G. Makes sense. Total size reported by df across all three items: 9.4G. Now what if I resize the zvols?<\/span><\/p>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol experiments 6\"># zfs set volsize=5G test\/disk1\r\n# resize2fs \/dev\/zvol\/test\/disk1\r\n# zfs set volsize=4G test\/disk2\r\n# resize2fs \/dev\/zvol\/test\/disk2\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          1.5G  128K  1.5G   1% \/test\r\n\/dev\/zd96                     4.9G  2.1G  2.7G  44% \/test\/disk1\r\n\/dev\/zd112                    3.9G  2.1G  1.7G  55% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        8.31G  1.48G    32K  \/test\r\ntest\/disk1  7.29G  6.63G  2.13G  -\r\ntest\/disk2  1.01G  1.48G  2.13G  -<\/pre>\n<p><span style=\"line-height: 1.5;\">Hmmm&#8230; What happened there? I increased total zvol size by 3G (+2G disk1, +1G disk2). The available space in the pool went down by about 2G, and the used space in disk1 went up by about 2G, but the used space in disk 2 stayed pretty much the same. Total space across all three items in df: 10.3G. Let&#8217;s fill them up and see what we get.<\/span><\/p>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol experiments 7\"># cp big_file_1 \/test\/disk1\r\n# cp tiny_file_1 \/test\/disk1\r\n# cp tiny_file_2 \/test\/disk1\r\n# cp tiny_file_3 \/test\/disk1\r\n# cp small_file_4 \/test\/disk2\r\n# cp tiny_file_4 \/test\/disk2\r\n# cp tiny_file_5 \/test\/disk2\r\n# cp tiny_file_6 \/test\/disk2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\n\/dev\/zd96                     4.9G  4.6G   23M 100% \/test\/disk1\r\n\/dev\/zd112                    3.9G  3.6G   74M  99% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        9.78G      0    32K  \/test\r\ntest\/disk1  7.29G  2.53G  4.76G  -\r\ntest\/disk2  2.49G      0  3.61G  -<\/pre>\n<p><span style=\"line-height: 1.5;\">Umm&#8230;. Ouch. Where did my pool go? The df command doesn&#8217;t even list it anymore, and according to zfs list, it&#8217;s completely full. Also, I have this mess in dmesg:<\/span><\/p>\n<pre class=\"lang:sh highlight:0 decode:true \" title=\"ZFS zvol experiments 8\">[385389.346555] end_request: critical space allocation error, dev zd112, sector 8067072\r\n[385389.346562] end_request: critical space allocation error, dev zd112, sector 8077312\r\n[385389.346567] end_request: critical space allocation error, dev zd112, sector 8097792\r\n[385389.346573] end_request: critical space allocation error, dev zd112, sector 8081408\r\n[385389.346578] end_request: critical space allocation error, dev zd112, sector 8068096\r\n[385389.346583] end_request: critical space allocation error, dev zd112, sector 8058880\r\n[385389.346588] end_request: critical space allocation error, dev zd112, sector 8050688\r\n[385389.346594] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 16 (offset 142606336 size 4194304 starting block 1009792)\r\n[385389.346603] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 15 (offset 142606336 size 8388608 starting block 1012352)\r\n[385389.346611] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 16 (offset 142606336 size 4194304 starting block 1010304)\r\n[385389.346619] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 15 (offset 134217728 size 8388608 starting block 1008640)\r\n[385389.346625] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 16 (offset 138412032 size 4194304 starting block 1007488)\r\n[385389.346637] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 16 (offset 134217728 size 4194304 starting block 1006464)\r\n[385389.346639] buffer_io_error: 170 callbacks suppressed\r\n[385389.346645] Buffer I\/O error on device zd112, logical block 1009664\r\n[385389.346647] Buffer I\/O error on device zd112, logical block 1010189\r\n[385389.346650] Buffer I\/O error on device zd112, logical block 1012247\r\n[385389.346653] Buffer I\/O error on device zd112, logical block 1008521\r\n[385389.346656] Buffer I\/O error on device zd112, logical block 1007368\r\n[385389.346661] Buffer I\/O error on device zd112, logical block 1006337\r\n[385389.346663] Buffer I\/O error on device zd112, logical block 1009665\r\n[385389.346665] Buffer I\/O error on device zd112, logical block 1010190\r\n[385389.346668] Buffer I\/O error on device zd112, logical block 1012248\r\n[385389.346670] Buffer I\/O error on device zd112, logical block 1008522\r\n[385389.346726] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 15 (offset 134217728 size 8388608 starting block 1008512)\r\n[385389.346946] end_request: critical space allocation error, dev zd112, sector 8056832\r\n[385389.346959] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 16 (offset 138412032 size 4194304 starting block 1007232)\r\n[385389.346979] end_request: critical space allocation error, dev zd112, sector 8099840\r\n[385389.346995] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 15 (offset 142606336 size 8388608 starting block 1012608)\r\n[385389.346998] end_request: critical space allocation error, dev zd112, sector 8100864\r\n[385389.347006] EXT4-fs warning (device zd112): ext4_end_bio:317: I\/O error writing to inode 15 (offset 142606336 size 8388608 starting block 1012864)\r\n[385389.699816] Aborting journal on device zd112-8.\r\n[385389.699839] EXT4-fs error (device zd112) in ext4_free_blocks:4858: Journal has aborted\r\n[385389.711017] EXT4-fs (zd112): Delayed block allocation failed for inode 15 at logical offset 46080 with max blocks 1024 with error 30\r\n[385389.711025] EXT4-fs (zd112): This should not happen!! Data will be lost\r\n[385389.711025] \r\n[385389.711046] EXT4-fs error (device zd112) in ext4_writepages:2536: Journal has aborted\r\n[385389.712520] EXT4-fs error (device zd112): ext4_journal_check_start:56: Detected aborted journal\r\n[385389.712530] EXT4-fs (zd112): Remounting filesystem read-only\r\n[385389.712644] EXT4-fs (zd112): ext4_writepages: jbd2_start: 1024 pages, ino 16; err -30\r\n[385389.783776] Buffer I\/O error on device zd112, logical block 786433\r\n[385389.783784] lost page write due to I\/O error on zd112\r\n[385389.783806] Buffer I\/O error on device zd112, logical block 786434\r\n[385389.783809] lost page write due to I\/O error on zd112<\/pre>\n<p>Indicating that disk2 is now mounted read-only:<\/p>\n<pre class=\"lang:sh decode:true \" title=\"ZFS zvol experiments 9\"># touch \/test\/disk2\/foo\r\ntouch: cannot touch \\u2018\/test\/disk2\/foo\\u2019: Read-only file system<\/pre>\n<p>Yup, read-only. Let&#8217;s try this again, but with a much-larger overlap between the two disks:<\/p>\n<pre class=\"lang:sh decode:true  \" title=\"ZFS zvol experiments 10\"># umount \/test\/disk2\r\n# umount \/test\/disk1\r\n# zpool destroy test\r\n# zpool create test test_pool\r\n# zfs create -V 3.5G test\/disk1\r\n# mkfs.ext4 \/dev\/zvol\/test\/disk1 \r\n# mkdir \/test\/disk1\r\n# mount \/dev\/zvol\/test\/disk1 \/test\/disk1\r\n# cp big_file_1 \/test\/disk1\r\n# cp small_file_1 \/test\/disk1\r\n# sync\r\n# zfs snap test\/disk1@s1\r\n# zfs clone test\/disk1@s1 test\/disk2\r\n# mkdir \/test\/disk2\r\n# mount \/dev\/zvol\/test\/disk2 \/test\/disk2\/\r\n# zfs set volsize=5G test\/disk1\r\n# zfs set volsize=4G test\/disk2\r\n# resize2fs \/dev\/zvol\/test\/disk1 \r\n# resize2fs \/dev\/zvol\/test\/disk2\r\n# cp small_file_2 \/test\/disk1\r\n# cp tiny_file_1 \/test\/disk1\r\n# cp tiny_file_2 \/test\/disk1\r\n# cp tiny_file_3 \/test\/disk1\r\n# cp tiny_file_4 \/test\/disk2\r\n# cp tiny_file_5 \/test\/disk2\r\n# cp tiny_file_6 \/test\/disk2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          900M     0  900M   0% \/test\r\n\/dev\/zd96                     4.9G  4.6G   16M 100% \/test\/disk1\r\n\/dev\/zd112                    3.9G  3.6G   74M  99% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        8.90G   899M    32K  \/test\r\ntest\/disk1  8.31G  4.43G  4.76G  -\r\ntest\/disk2   607M   899M  3.75G  -<\/pre>\n<p>Well, at least this time I didn&#8217;t run out of disk space. No errors in dmesg either. I don&#8217;t see a huge space savings, though: 4.6G on one disk, 3.6G on the second, 3G &#8220;shared&#8221; in the snapshot&#8230; total disk usage: 8.9G??? In fact, it seems to take up more total space than if I were to just store those same files in the pool directly. Let&#8217;s try that. In my two zvols, I have stored two big files, three small files, and six tiny files:<\/p>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol experiments 11\"># umount \/test\/disk1\r\n# umount \/test\/disk2\r\n# zpool destroy test\r\n# zpool create test \/storage\/tmp\/test_pool\r\n# cp big_file_1 \/test\r\n# cp big_file_2 \/test\r\n# cp small_file_1 \/test\r\n# cp small_file_2 \/test\r\n# cp small_file_3 \/test\r\n# cp tiny_file_1 \/test\r\n# cp tiny_file_2 \/test\r\n# cp tiny_file_3 \/test\r\n# cp tiny_file_4 \/test\r\n# cp tiny_file_5 \/test\r\n# cp tiny_file_6 \/test\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          9.8G  8.2G  1.7G  84% \/test\r\n\r\n# zfs list -r test\r\nNAME   USED  AVAIL  REFER  MOUNTPOINT\r\ntest  8.18G  1.60G  8.18G  \/test<\/pre>\n<p><span style=\"line-height: 1.5;\">Perhaps resizing the zvols caused some kind of disconnect in the data sharing? Good theory. Let&#8217;s try it without resizing:<\/span><\/p>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol experiments 12\"># umount \/test\/disk1\r\n# umount \/test\/disk2\r\n# zpool destroy test\r\n# zpool create test test_pool\r\n# zfs create -V 4.5G test\/disk1\r\n# mkfs.ext4 \/dev\/zvol\/test\/disk1 \r\n# mkdir \/test\/disk1\r\n# mount \/dev\/zvol\/test\/disk1 \/test\/disk1\r\n# cp big_file_1 \/test\/disk1\/\r\n# cp small_file_1 \/test\/disk1\r\n# zfs snap test\/disk1@s1\r\n# zfs clone test\/disk1@s1 test\/disk2\r\n# zfs destroy test\/disk2\r\n# zfs destroy test\/disk1@s1\r\n# sync\r\n# zfs snap test\/disk1@s1\r\n# zfs clone test\/disk1@s1 test\/disk2\r\n# mkdir \/test\/disk2\r\n# mount \/dev\/zvol\/test\/disk2 \/test\/disk2\r\n# df -h | grep \"test\\|Size\"\r\n# cp small_file_2 \/test\/disk1\r\n# cp small_file_3 \/test\/disk2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          893M     0  893M   0% \/test\r\n\/dev\/zd96                     4.4G  4.1G   57M  99% \/test\/disk1\r\n\/dev\/zd112                    4.4G  4.1G   57M  99% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        8.91G   892M    32K  \/test\r\ntest\/disk1  7.90G  4.50G  4.27G  -\r\ntest\/disk2  1.01G   892M  4.27G  -<\/pre>\n<p>Again, around 8G of data, with 3G shared, takes about 8.9G of storage space. I&#8217;m not impressed with cloning zvols as a means to save on storage space.<\/p>\n<h1>Dedup<\/h1>\n<p>What does it mean to dedup data between zvols? When data is written to disk2 that is a duplicate of data already in disk1, where does the extra space saved by the dedup logic go to? Let&#8217;s experiment:<\/p>\n<pre class=\"lang:sh decode:true \" title=\"ZFS zvol experiments 13\"># zfs create -V 5G test\/disk1\r\n# zfs set dedup=on test\r\n# zfs create -V 4G test\/disk2\r\n# mkdir \/test\/disk1\r\n# mkdir \/test\/disk2\r\n# mkfs.ext4 \/dev\/zvol\/test\/disk1 \r\n# mkfs.ext4 \/dev\/zvol\/test\/disk2\r\n# mount \/dev\/zvol\/test\/disk1 \/test\/disk1\r\n# mount \/dev\/zvol\/test\/disk2 \/test\/disk2\r\n# cp big_file_1 \/test\/disk1\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          790M  128K  790M   1% \/test\r\n\/dev\/zd96                     4.8G  2.1G  2.6G  45% \/test\/disk1\r\n\/dev\/zd112                    3.9G  8.0M  3.6G   1% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        9.35G   798M    32K  \/test\r\ntest\/disk1  5.16G  3.75G  2.19G  -\r\ntest\/disk2  4.13G  4.72G   194M  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  2.09G  7.84G    21%  1.17x  ONLINE  -\r\n\r\n# cp big_file_1 \/test\/disk2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          2.8G  128K  2.8G   1% \/test\r\n\/dev\/zd96                     4.8G  2.1G  2.6G  45% \/test\/disk1\r\n\/dev\/zd112                    3.9G  2.1G  1.6G  56% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        9.38G  2.76G    32K  \/test\r\ntest\/disk1  5.16G  5.69G  2.23G  -\r\ntest\/disk2  4.13G  4.67G  2.21G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  2.15G  7.79G    21%  2.19x  ONLINE  -\r\n\r\n# cp small_file_1 \/test\/disk1\r\n# cp small_file_1 \/test\/disk2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          3.8G     0  3.8G   0% \/test\r\n\/dev\/zd96                     4.8G  3.1G  1.6G  67% \/test\/disk1\r\n\/dev\/zd112                    3.9G  3.1G  603M  84% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        9.42G  3.70G    32K  \/test\r\ntest\/disk1  5.16G  5.62G  3.24G  -\r\ntest\/disk2  4.13G  4.60G  3.22G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  3.26G  6.68G    32%  2.12x  ONLINE  -\r\n\r\n# cp small_file_2 \/test\/disk1\r\n# cp tiny_file_1 \/test\/disk1\r\n# cp tiny_file_2 \/test\/disk1\r\n# cp tiny_file_3 \/test\/disk1\r\n# cp tiny_file_4 \/test\/disk2\r\n# cp tiny_file_5 \/test\/disk2\r\n# cp tiny_file_6 \/test\/disk2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          3.7G     0  3.7G   0% \/test\r\n\/dev\/zd96                     4.8G  4.6G     0 100% \/test\/disk1\r\n\/dev\/zd112                    3.9G  3.6G  2.9M 100% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        9.46G  3.66G    32K  \/test\r\ntest\/disk1  5.16G  3.97G  4.85G  -\r\ntest\/disk2  4.13G  3.96G  3.82G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  5.64G  4.30G    56%  1.65x  ONLINE  -<\/pre>\n<p>Now this shows some promise! This is what I was hoping to see with my cloning tests!<\/p>\n<h1>Compression<\/h1>\n<p>Is compression going to work the same as dedup? Will I see the space savings as a return of storage to the pool? Let&#8217;s see.<\/p>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol experiments 14\"># zfs create -V 9G test\/disk1\r\n# zfs set compression=on test\/disk1\r\n# mkfs.ext4 \/dev\/zvol\/test\/disk1 \r\n# mkdir \/test\/disk1\r\n# mount \/dev\/zvol\/test\/disk1 \/test\/disk1\r\n# dd if=\/dev\/zero of=\/test\/disk1\/file1 bs=1G count=1\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          511M     0  511M   0% \/test\r\n\/dev\/zd96                     8.8G  1.1G  7.3G  13% \/test\/disk1\r\n\r\n# dd if=\/dev\/zero of=\/test\/disk1\/file2 bs=1G count=5\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          511M  128K  511M   1% \/test\r\n\/dev\/zd96                     8.8G  6.1G  2.3G  73% \/test\/disk1\r\n\r\n# dd if=\/dev\/zero of=\/test\/disk1\/file3 bs=1G count=2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          511M     0  511M   0% \/test\r\n\/dev\/zd96                     8.8G  8.1G  255M  97% \/test\/disk1\r\n\r\n# dd if=\/dev\/zero of=\/test\/disk1\/file4 bs=1M count=250\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          511M     0  511M   0% \/test\r\n\/dev\/zd96                     8.8G  8.3G  4.4M 100% \/test\/disk1\r\n\r\n# zfs get compressratio test\/disk1\r\nNAME        PROPERTY       VALUE  SOURCE\r\ntest\/disk1  compressratio  11.65x  -\r\n\r\n# resize2fs \/dev\/zvol\/test\/disk1 \r\nresize2fs 1.42.9 (4-Feb-2014)\r\nThe filesystem is already 2359296 blocks long.  Nothing to do!\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        9.28G   510M    31K  \/test\r\ntest\/disk1  9.28G  9.78G   334K  -<\/pre>\n<p>As far as I can tell, the data really is compressed on disk. The compressratio is large (though significantly smaller than what I would expect). The dd commands each completed very quickly, indicating that little data was actually written to disk. But I can&#8217;t seem to figure out how to take advantage of the space savings that the compression provided. I&#8217;m somewhat disappointed.<\/p>\n<h1>Dedup combined with cloning<\/h1>\n<p>Well, dedup will give me the space savings that I was hoping to achieve by cloning zvols. But can I turn on dedup and use cloning to create multiple copies of a VM that all start in a common state? Let&#8217;s try that.<\/p>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol experiments 15\"># zfs set dedup=on test\r\n# zfs create -V 3G test\/disk1\r\n# mkdir \/test\/disk1\r\n# mkfs.ext4 \/dev\/zvol\/test\/disk1 \r\n# mount \/dev\/zvol\/test\/disk1 \/test\/disk1\/\r\n# cp big_file_1 \/test\/disk1\r\n# sync\r\n# zfs snap test\/disk1@s1\r\n# zfs clone test\/disk1@s1 test\/disk2\r\n# mkdir \/test\/disk2\r\n# mount \/dev\/zvol\/test\/disk2 \/test\/disk2\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          4.6G     0  4.6G   0% \/test\r\n\/dev\/zd96                     2.9G  2.1G  738M  74% \/test\/disk1\r\n\/dev\/zd112                    2.9G  2.1G  738M  74% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        5.29G  4.60G    32K  \/test\r\ntest\/disk1  5.23G  7.69G  2.13G  -\r\ntest\/disk2   185K  4.60G  2.13G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  2.09G  7.85G    21%  1.05x  ONLINE  -\r\n\r\n# zfs set volsize=5G test\/disk1\r\n# zfs set volsize=4G test\/disk2\r\n# resize2fs \/dev\/zvol\/test\/disk1 \r\n# resize2fs \/dev\/zvol\/test\/disk2\r\n# cp small_file_1 \/test\/disk1\r\n# cp small_file_1 \/test\/disk2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          2.5G     0  2.5G   0% \/test\r\n\/dev\/zd96                     4.9G  3.1G  1.7G  66% \/test\/disk1\r\n\/dev\/zd112                    3.9G  3.1G  680M  82% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        8.42G  2.46G    32K  \/test\r\ntest\/disk1  7.29G  6.60G  3.15G  -\r\ntest\/disk2  1.02G  2.46G  3.15G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  3.17G  6.77G    31%  1.37x  ONLINE  -<\/pre>\n<p>Remember, in my dedup tests above, when I had two filesystems with both big_file_1 and small_file_1 on each, I had 3.8G free space, and zpool list reported a 2.12x dedup ratio. Seeding my new filesystem with a clone of a snapshot isn&#8217;t going too well. Maybe if I overwrite the cloned copy of big_file_1? Perhaps it will notice then that it is a duplicate?<\/p>\n<pre class=\"lang:sh decode:true \" title=\"ZFS zvol experiments 16\"># cp big_file_1 \/test\/disk2\r\n# sync\r\n\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          2.4G  128K  2.4G   1% \/test\r\n\/dev\/zd96                     4.9G  3.1G  1.7G  66% \/test\/disk1\r\n\/dev\/zd112                    3.9G  3.1G  680M  82% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        10.5G  2.34G    32K  \/test\r\ntest\/disk1  7.29G  6.48G  3.15G  -\r\ntest\/disk2  3.04G  2.34G  3.76G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  3.25G  6.69G    32%  2.03x  ONLINE  -<\/pre>\n<p>Hmmm&#8230; The dedup ratio went up (though not quite to 2.12x) but the available space actually went down! What&#8217;s with that? I&#8217;m going to try to replace the copy of big_file_1 on disk 1.<\/p>\n<pre class=\"lang:sh decode:true \" title=\"ZFS zvol experiments 17\"># cp big_file_1 \/test\/disk1\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem                    Size  Used Avail Use% Mounted on\r\ntest                          4.1G  128K  4.1G   1% \/test\r\n\/dev\/zd96                     4.9G  3.1G  1.7G  66% \/test\/disk1\r\n\/dev\/zd112                    3.9G  3.1G  680M  82% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        10.5G  4.25G    32K  \/test\r\ntest\/disk1  7.29G  6.42G  4.65G  -\r\ntest\/disk2  3.04G  4.25G  3.76G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  3.40G  6.53G    34%  2.70x  ONLINE  -<\/pre>\n<p>There we go. Now I&#8217;m seeing similar results to what I saw in the original dedup tests. My conclusion: deduplication doesn&#8217;t do anything\u00a0when cloning zvols.<\/p>\n<h1>Dedup with Send \/ Receive<\/h1>\n<p>I expect this should work, but I&#8217;ve been surprised with some results so far, so it&#8217;s worth experimenting just to be sure.<\/p>\n<pre class=\"lang:sh decode:true\" title=\"ZFS zvol tests 18\"># zpool create test test_pool\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem      Size  Used Avail Use% Mounted on\r\ntest            9.8G     0  9.8G   0% \/test\r\n\r\n# zfs list -r test\r\nNAME   USED  AVAIL  REFER  MOUNTPOINT\r\ntest   108K  9.78G    30K  \/test\r\n\r\n# zfs set dedup=\"on\" test\r\n# zfs create -V 3G test\/disk1\r\n# mkfs.ext4 \/dev\/zvol\/test\/disk1\r\n# mkdir \/test\/disk1\r\n# mount \/dev\/zvol\/test\/disk1 \/test\/disk1\r\n# cp small_file_1 \/test\/disk1\r\n# sync\r\n# zfs snap test\/disk1@s1\r\n# cp small_file_2 \/test\/disk1\r\n# sync\r\n# zfs snap test\/disk1@s2\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem      Size  Used Avail Use% Mounted on\r\ntest            5.7G     0  5.7G   0% \/test\r\n\/dev\/zd0        2.9G  2.1G  738M  74% \/test\/disk1\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        4.28G  5.61G    31K  \/test\r\ntest\/disk1  4.22G  7.69G  2.13G  -\r\n\r\n# zfs create -V 3G test\/disk2\r\n# zfs send test\/disk1@s1 | zfs receive -F test\/disk2\r\n# mkdir \/test\/disk2\r\n# mount \/dev\/zvol\/test\/disk2 \/test\/disk2\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem      Size  Used Avail Use% Mounted on\r\ntest            2.5G     0  2.5G   0% \/test\r\n\/dev\/zd0        2.9G  2.1G  738M  74% \/test\/disk1\r\n\/dev\/zd16       2.9G  1.1G  1.8G  37% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        8.51G  2.47G    32K  \/test\r\ntest\/disk1  4.22G  4.55G  2.13G  -\r\ntest\/disk2  4.22G  5.56G  1.12G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  2.13G  7.80G    21%  1.60x  ONLINE  -\r\n\r\n# umount \/test\/disk2 \r\n# zfs send -i s1 test\/disk1@s2 | zfs receive -F test\/disk2\r\n# mount \/dev\/zvol\/test\/disk2 \/test\/disk2\r\n# df -h | grep \"test\\|Size\"\r\nFilesystem      Size  Used Avail Use% Mounted on\r\ntest            1.5G  128K  1.5G   1% \/test\r\n\/dev\/zd16       2.9G  2.1G  738M  74% \/test\/disk1\r\n\/dev\/zd0        2.9G  2.1G  738M  74% \/test\/disk2\r\n\r\n# zfs list -r test\r\nNAME         USED  AVAIL  REFER  MOUNTPOINT\r\ntest        10.6G  1.41G    32K  \/test\r\ntest\/disk1  5.23G  4.50G  2.13G  -\r\ntest\/disk2  5.23G  4.50G  2.13G  -\r\n\r\n# zpool list test\r\nNAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT\r\ntest  9.94G  2.15G  7.79G    21%  2.10x  ONLINE  -\r\n\r\n# zfs create -V 3G test\/disk3\r\ncannot create 'test\/disk3': out of space\r\n<\/pre>\n<p>I don&#8217;t really get this one. The `df` and `zfs list` results indicate that it&#8217;s not working. Available space stays at about 1.5G, regardless of the amount of overlap. However, the `zpool list` shows about 7.8G free, and it does indicate that the dedup ratio is increasing as I increase the amount of overlap between the zvols.<\/p>\n<p>In the end, though, the last statement is telling. Even though the two zvols are 66% overlapping (2G out of 3G identical) there isn&#8217;t enough room in the pool to create a third zvol. I would expect this to work. First zvol uses 3G, right? Second zvol uses 1G (3G &#8211; 2G dup data). That means out of my 10G pool, only 4G is in use. I should have plenty of space available for another 3G zvol. If I can&#8217;t use the space savings, what&#8217;s the point?<\/p>\n<h1>Conclusion<\/h1>\n<p>When working with VMs on ZFS zvols, if space savings is a major concern for you, I recommend the following:<\/p>\n<ul>\n<li>Create a ZFS pool to hold all of your VM root filesystems<\/li>\n<li><span style=\"line-height: 1.5;\">Turn on deduplication across the entire pool<\/span><\/li>\n<li><span style=\"line-height: 1.5;\">Do not use compression<\/span><\/li>\n<li><span style=\"line-height: 1.5;\">Do not clone a new machine from a snapshot of an old machine<\/span><\/li>\n<li>Do not use `zfs send` \/ `zfs receive` to create a new machine from a snapshot of an old machine<\/li>\n<\/ul>\n<p>Or maybe better yet, don&#8217;t try to use dedup across zvols. The results are pretty confusing, and I&#8217;m not sure it&#8217;s worth it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>These experiments were performed to determine the best setup for creating a handful of VMs on Xen, utilizing a minimum of disk space, with the realization that much of the filesystem across the multiple VMs will be identical. I may, however, want the size of each VM&#8217;s filesystem to be different, depending on the purpose. &hellip; <a href=\"https:\/\/www.hentschels.com\/blog\/?p=109\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">ZFS zvols for VM usage<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[7,8],"tags":[],"_links":{"self":[{"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/109"}],"collection":[{"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=109"}],"version-history":[{"count":12,"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/109\/revisions"}],"predecessor-version":[{"id":121,"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/109\/revisions\/121"}],"wp:attachment":[{"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=109"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=109"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hentschels.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=109"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}