-m1--corelog-nmenascleopatra Here we"ve created a mirrored logical volume that keeps its logs in core (rather than on a separate physical device). Note that this step takes a group name rather than a device node. Also, the mirror is purely for ill.u.s.trative purposes-it"s not required if you"re using some sort of redundant device, such as hardware RAID or MD. Finally, it"s an administrative convenience to give LVs human-readable names using the -n -n option. It"s not required but quite recommended. option. It"s not required but quite recommended.Figure4-3.lvcreate creates a logical volume, /dev/vg/lvol, by chopping some s.p.a.ce out of the LV, which is transparently mapped to possibly discontinuous physical extents on PVs.
Create a filesystem using your favorite filesystem-creation tool: #mkfs/dev/cleopatra/menas At this point, the LV is ready to mount and access, just as if it were a normal disk.
#mount/dev/cleopatra/menas/mnt/hd To make the new device a suitable root for a Xen domain, copy a filesystem into it. We used one from just mounted their root filesystem and copied it over to our new volume.
#mount-oloopgentoo.img/mnt/tmp/ #cp-a/mnt/tmp/*/mnt/hd Finally, to use it with Xen, we can specify the logical volume to the guest domain just as we would any physical device. (Note that here we"re back to the same example we started the chapter with.) disk=["phy:/dev/cleopatra/menas,sda1,w"]
At this point, start the machine. Cross your fingers, wave a dead chicken, perform the accustomed ritual. In this case our deity is propitiated by an xm create xm create. Standards have come down in the past few millennia.
#xmcreatemenas
[27] This example is not purely academic. This example is not purely academic.
[28] This is unlikely to be a problem unless you are using Slackware. This is unlikely to be a problem unless you are using Slackware.
Enlarge Your Disk Both file-backed images and LVM disks can be expanded transparently from the dom0. We"re going to a.s.sume that disk s.p.a.ce is so plentiful that you will never need to shrink an image.
Be sure to stop the domain before attempting to resize its underlying filesystem. For one thing, all of the user-s.p.a.ce resize tools that we know of won"t attempt to resize a mounted filesystem. For another, the Xen hypervisor won"t pa.s.s along changes to the underlying block device"s size without restarting the domain. Most important, even if you were able to resize the backing store with the domain running, data corruption would almost certainly result.
File-Backed Images The principle behind augmenting file-backed images is simple: We append more bits to the file, then expand the filesystem.
First, make sure that nothing is using the file. Stop any domUs that have it mounted. Detach it from the dom0. Failure to do this will likely result in filesystem corruption.
Next, use dd dd to add some bits to the end. In this case we"re directing 1GB from our to add some bits to the end. In this case we"re directing 1GB from our /dev/zero /dev/zero bit hose to bit hose to anthony.img anthony.img. (Note that not specifying an output file causes dd dd to write to stdout.) to write to stdout.) #ddif=/dev/zerobs=1Mcount=1024>>/opt/xen/anthony.img Use resize2fs resize2fs to extend the filesystem (or the equivalent tool for your choice of filesystem). to extend the filesystem (or the equivalent tool for your choice of filesystem).
#e2fsck-f/opt/xen/anthony.img #resize2fs/opt/xen/anthony.img resize2fs will default to making the filesystem the size of the underlying device if there"s no part.i.tion table. will default to making the filesystem the size of the underlying device if there"s no part.i.tion table.
If the image contains part.i.tions, you"ll need to rearrange those before resizing the filesystem. Use fdisk fdisk to delete the part.i.tion that you wish to resize and recreate it, making sure that the starting cylinder remains the same. to delete the part.i.tion that you wish to resize and recreate it, making sure that the starting cylinder remains the same.
LVM It"s just as easy, or perhaps even easier, to use LVM to expand storage. LVM was designed from the beginning to increase the flexibility of storage devices, so it includes an easy mechanism to extend a volume (as well as shrink and move).
If there"s free s.p.a.ce in the volume group, simply issue the command: #lvextend-L+1G/dev/cleopatra/charmian If the volume group is full, you"ll need to expand it. Just add a disk to the machine and extend the vg: #vgextend/dev/cleopatra/dev/sdc1 Finally, just as in the previous example, handle the filesystem-level expansion-we"ll present this one using ReiserFS.
#resize_reiserfs-s+1G/dev/cleopatra/charmian
Copy-on-Write and Snapshots One of the other niceties that a real storage option gives you is copy-on-write, which means that, rather than the domU overwriting a file when it"s changed, the backend instead transparently writes a copy elsewhere.[29] As a corollary, the original filesystem remains as a As a corollary, the original filesystem remains as a snapshot snapshot, with all modifications directed to the copy-on-write clone.
This snapshot provides the ability to save a filesystem"s state, taking a snapshot of it at a given time or at set intervals. There are two useful things about snapshots: for one, they allow for easy recovery from user error.[30] For another, they give you a checkpoint that"s known to be consistent-it"s something that you can conveniently back up and move elsewhere. This eliminates the need to take servers offline for backups, such as we had to do in the dark ages. For another, they give you a checkpoint that"s known to be consistent-it"s something that you can conveniently back up and move elsewhere. This eliminates the need to take servers offline for backups, such as we had to do in the dark ages.
CoW likewise has a bunch of uses. Of these, the most fundamental implication for Xen is that it can dramatically reduce the on-disk overhead of each virtual machine-rather than using a simple file as a block device or a logical volume, many machines can share a single base filesystem image, only requiring disk s.p.a.ce to write their changes to that filesystem.
CoW also comes with its own disadvantages. First, there"s a speed penalty. The CoW infrastructure slows disk access down quite a bit compared with writing directly to the device, for both reading and writing.
If you"re using spa.r.s.e allocation for CoW volumes, the speed penalty becomes greater due to the overhead of allocating and remapping blocks. This leads to fragmentation, which carries its own set of performance penalties. CoW can also lead to the administrative problem of oversubscription; by making it possible to oversubscribe disk s.p.a.ce, it makes life much harder if you accidentally run out. You can avoid all of this by simply allocating s.p.a.ce in advance.
There"s also a trade-off in terms of administrative complexity, as with most interesting features. Ultimately, you, the Xen administrator, have to decide how much complexity is worth having.
We"ll discuss device mapper snapshots, as used by LVM because they"re the implementation that we"re most familiar with. For shared storage, we"ll focus on NFS and go into more detail on shared storage systems in Chapter9 Chapter9. We also outline a CoW solution with UnionFS in Chapter7 Chapter7. Finally, you might want to try QCOW block devices-although we haven"t had much luck with them, your mileage may vary.
[29] This is traditionally abbreviated CoW, partly because it"s shorter, but mostly because "cow" is an inherently funny word. Just ask Wikipedia. This is traditionally abbreviated CoW, partly because it"s shorter, but mostly because "cow" is an inherently funny word. Just ask Wikipedia.
[30] It"s not as hard you might suppose to It"s not as hard you might suppose to rm rm your home directory. your home directory.
LVM and Snapshots LVM snapshots are designed more to back up back up and and checkpoint checkpoint a filesystem than as a means of long-term storage. It"s important to keep LVM snapshots relatively fresh-or, in other words, make sure to drop them when your backup is done. a filesystem than as a means of long-term storage. It"s important to keep LVM snapshots relatively fresh-or, in other words, make sure to drop them when your backup is done.[31]
Snapshot volumes can also be used as read-write backing store for domains, especially in situations where you just want to generate a quick domU for testing, based on some preexisting disk image. The LVM doc.u.mentation notes that you can create a basic image, snapshot it multiple times, and modify each snapshot slightly for another domain. In this case, LVM snapshots would act like a block-level UnionFS. However, note that when a snapshot fills up, it"s immediately dropped by the kernel. This may lead to data loss. The basic procedure for adding an LVM snapshot is simple: Make sure that you have some unused s.p.a.ce in your volume group, and create a snapshot volume for it.
THE XEN L IVECD REVISITED: COPY-ON-WRITE IN ACTIONThe Xen LiveCD actually is a pretty nifty release. One of its neatest features is the ability to automatically create copy-on-write block devices when a Xen domain starts, based on read-only images on the CD.The implementation uses the device mapper to set up block devices and snapshots based on flat files, and is surprisingly simple.First, the basic storage is defined with a line like this in the domain config file:disk=["cow:/mnt/cdrom/rootfs.img30,sda1,w"]Note the use of the cow: cow: prefix, which we haven"t mentioned yet. This is actually a custom prefix rather than part of the normal Xen package. prefix, which we haven"t mentioned yet. This is actually a custom prefix rather than part of the normal Xen package.We can add custom prefixes like cow: because /etc/xen/scripts/create_block_device /etc/xen/scripts/create_block_device falls through to a script with a name of the form falls through to a script with a name of the form block-[type] block-[type] if it finds an unknown device type-in this case, cow. The if it finds an unknown device type-in this case, cow. The block-cow block-cow script expects one argument, either script expects one argument, either create or destroy create or destroy, which the domain builder provides when it calls the script. block-cow block-cow then calls either the then calls either the create_cow or destroy_cow create_cow or destroy_cow script, as appropriate. script, as appropriate.The real setup takes place in a script, /usr/sbin/create_cow /usr/sbin/create_cow. This script essentially uses the device mapper to create a copy-on-write device based on an LVM snapshot,[32] which it presents to the domain. We won"t reproduce it here, but it"s a good example of how standard Linux features can form the basis for complex, abstracted functions. In other words, a good hack. which it presents to the domain. We won"t reproduce it here, but it"s a good example of how standard Linux features can form the basis for complex, abstracted functions. In other words, a good hack.
First, check to see whether you have the driver dm_snapshot dm_snapshot. Most modern distros ship with this driver built as a loadable module. (If it"s not built, go to your Linux kernel source tree and compile it.) #locatedm_snapshot.ko Manually load it if necessary.
#modprobedm_snapshot Create the snapshot using the lvcreate lvcreate command with the command with the -s -s option to indicate "snapshot." The other parameters specify a length and name as in an ordinary logical volume. The final parameter specifies the option to indicate "snapshot." The other parameters specify a length and name as in an ordinary logical volume. The final parameter specifies the origin origin, or volume being snapshotted.
#lvcreate-s-L100M-npompei.snap/dev/cleopatra/pompei This snapshot then appears to be a frozen image of the filesystem-writes will happen as normal on the original volume, but the snapshot will retain changed files as they were when the snapshot was taken, up to the maximum capacity of the snapshot.
When making a snapshot, the length indicates the maximum amount of changed data that the snapshot will be able to store. If the snapshot fills up, it"ll be dropped automatically by the kernel driver and will become unusable.
For a sample script that uses an LVM snapshot to back up a Xen instance, see Chapter7 Chapter7.
[31] Even if you add no data to the snapshot itself, it can run out of s.p.a.ce (and corrupt itself) just keeping up with changes in the main LV. Even if you add no data to the snapshot itself, it can run out of s.p.a.ce (and corrupt itself) just keeping up with changes in the main LV.
[32] More properly, a device mapper snapshot, which LVM snapshots are based on. LVM snapshots are device mapper snapshots, but device mapper snapshots can be based on any pair of block devices, LVM or not. The LVM tools provide a convenient frontend to the arcane commands used by More properly, a device mapper snapshot, which LVM snapshots are based on. LVM snapshots are device mapper snapshots, but device mapper snapshots can be based on any pair of block devices, LVM or not. The LVM tools provide a convenient frontend to the arcane commands used by dmsetup dmsetup.
Storage and Migration These two storage techniques-flat files and LVM-lend themselves well to easy and automated cold migration cold migration, in which the administrator halts the domain, copies the domain"s config file and backing storage to another physical machine, and restarts the domain.
Copying over a file-based backend is as simple as copying any file over the network. Just drop it onto the new box in its corresponding place in the filesystem, and start the machine.
Copying an LVM is a bit more involved, but it is still straightforward: Make the target device, mount it, and move the files in whatever fashion you care to.
Check Chapter9 Chapter9 for more details on this sort of migration. for more details on this sort of migration.
Network Storage These two storage methods only apply to locally accessible storage. Live migration, in which a domain is moved from one machine to another without being halted, requires one other piece of this puzzle: The filesystem must be accessible over the network to multiple machines. This is an area of active development, with several competing solutions. Here we"ll discuss NFS-based storage. We will address other solutions, including ATA over Ethernet and iSCSI, in Chapter9 Chapter9.
NFS NFS is older than we are, and it is used by organizations of all sizes. It"s easy to set up and relatively easy to administer. Most operating systems can interact with it. For these reasons, it"s probably the easiest, cheapest, and fastest way to set up a live migration-capable Xen domain.
The idea is to marshal Xen"s networking metaphor: The domains are connected (in the default setup) to a virtual network switch. Because the dom0 is also attached to this switch, it can act as an NFS server for the domUs.
In this case we"re exporting a directory tree-neither a physical device nor a file. NFS server setup is quite simple, and it"s cross platform, so you can use any NFS device you like. (We prefer FreeBSD-based NFS servers, but NetApp and several other companies produce fine NFS appliances. As we might have mentioned, we"ve had poor luck using Linux as an NFS server.) Simply export your OS image. In our example, on the FreeBSD NFS server at 192.0.2.7, we have a full Slackware image at /usr/xen/images/slack /usr/xen/images/slack. Our /etc/exports /etc/exports looks a bit like this: looks a bit like this: /usr/xen/images/slack-maproot=0192.0.2.222 We leave further server-side setup to your doubtless extensive experience with NFS. One easy refinement would be to make / read-only and shared, then export read-write VM-specific /var /var and and /home /home part.i.tions-but in the simplest case, just export a full image. part.i.tions-but in the simplest case, just export a full image.
NoteAlthough NFS does imply a performance hit, it"s important to recall that Xen"s network buffers and disk buffers are provided by the same paravirtualized device infrastructure, and so the actual network hardware is not involved. There is increased overhead in transversing the networking stack, but performance is usually better than gigabit Ethernet, so it is not as bad as you might think.
Now configure the client (CONFIG_IP_PNP=y). First, you"ll need to make some changes to the domU"s kernel to enable root on NFS: networking-> networkingoptions-> ip:kernellevelautoconfiguration If you want to do everything via DHCP (although you should probably still specify a MAC address in your domain config file), add DHCP support under that tree: CONFIG_IP_PNP_DHCP CONFIG_IP_PNP_DHCP: or CONFIG_IP_PNP_BOOTP CONFIG_IP_PNP_BOOTP if you"re old school. If you are okay specifying the IP in your domU config file, skip that step. if you"re old school. If you are okay specifying the IP in your domU config file, skip that step.
Now you need to enable support for root on NFS. Make sure NFS support is Y and not M; that is, CONFIG_NFS_FS=Y CONFIG_NFS_FS=Y. Next, enable root over NFS: CONFIG_ROOT_NFS=Y CONFIG_ROOT_NFS=Y. In menuconfig menuconfig, you can find that option under: Filesystems-> NetworkFileSystems-> NFSfilesystemsupport-> RootoverNFS Note that menuconfig menuconfig won"t give you the option of selecting root over NFS until you select kernel-level IP autoconfiguration. won"t give you the option of selecting root over NFS until you select kernel-level IP autoconfiguration.
Build the kernel as normal and install it somewhere where Xen can load it. Most likely this isn"t what you want for a dom0 kernel, so make sure to avoid overwriting the boot kernel.
Now configure the domain that you"re going to boot over NFS. Edit the domain"s config file: #Rootdevicefornfs.
root="/dev/nfs"
#Thenfsserver.
nfs_server="38.99.2.7"
#Rootdirectoryonthenfsserver.
nfs_root="/usr/xen/images/slack"
netmask="255.255.255.0"