Proxmox VE is based on the famous Debian Linux distribution. That means that you have access to the whole world of Debian packages, and the base system is well documented. The Debian Administrator's Handbook is available online, and provides a comprehensive introduction to the Debian operating system (see [Hertzog13]).
A standard Proxmox VE installation uses the default repositories from Debian, so you get bug fixes and security updates through that channel. In addition, we provide our own package repository to roll out all Proxmox VE related packages. This includes updates to some Debian packages when necessary.
We also deliver a specially optimized Linux kernel, where we enable all required virtualization and container features. That kernel includes drivers for ZFS, and several hardware drivers. For example, we ship Intel network card drivers to support their newest hardware.
The following sections will concentrate on virtualization related topics. They either explain things which are different on Proxmox VE, or tasks which are commonly used on Proxmox VE. For other topics, please refer to the standard Debian documentation.
All Debian based systems use APT as package management tool. The list of repositories is defined in /etc/apt/sources.list and .list files found inside /etc/apt/sources.d/. Updates can be installed directly using apt-get, or via the GUI.
Apt sources.list files list one package repository per line, with the most preferred source listed first. Empty lines are ignored, and a # character anywhere on a line marks the remainder of that line as a comment. The information available from the configured sources is acquired by apt-get update.
deb http://ftp.debian.org/debian stretch main contrib # security updates deb http://security.debian.org stretch/updates main contrib
In addition, Proxmox VE provides three different package repositories.
Proxmox VE Enterprise Repository
This is the default, stable and recommended repository, available for all Proxmox VE subscription users. It contains the most stable packages, and is suitable for production use. The pve-enterprise repository is enabled by default:
deb https://enterprise.proxmox.com/debian/pve stretch pve-enterprise
As soon as updates are available, the root@pam user is notified via email about the available new packages. On the GUI, the change-log of each package can be viewed (if available), showing all details of the update. So you will never miss important security fixes.
Please note that and you need a valid subscription key to access this repository. We offer different support levels, and you can find further details at https://www.proxmox.com/en/proxmox-ve/pricing.
|You can disable this repository by commenting out the above line using a # (at the start of the line). This prevents error messages if you do not have a subscription key. Please configure the pve-no-subscription repository in that case.|
Proxmox VE No-Subscription Repository
As the name suggests, you do not need a subscription key to access this repository. It can be used for testing and non-production use. Its not recommended to run on production servers, as these packages are not always heavily tested and validated.
We recommend to configure this repository in /etc/apt/sources.list.
deb http://ftp.debian.org/debian stretch main contrib # PVE pve-no-subscription repository provided by proxmox.com, # NOT recommended for production use deb http://download.proxmox.com/debian/pve stretch pve-no-subscription # security updates deb http://security.debian.org stretch/updates main contrib
Proxmox VE Test Repository
Finally, there is a repository called pvetest. This one contains the latest packages and is heavily used by developers to test new features. As usual, you can configure this using /etc/apt/sources.list by adding the following line:
deb http://download.proxmox.com/debian/pve stretch pvetest
|the pvetest repository should (as the name implies) only be used for testing new features or bug fixes.|
Proxmox VE Ceph Repository
This is Proxmox VE’s main Ceph repository and holds the Ceph packages for production use. You can also use this repository to update only the Ceph client.
deb http://download.proxmox.com/debian/ceph-luminous stretch main
Proxmox VE Ceph Testing Repository
This Ceph repository contains the Ceph packages before they are moved into the main repository and is used to test new Ceph release on Proxmox VE.
deb http://download.proxmox.com/debian/ceph-luminous stretch test
We use GnuPG to sign the Release files inside those repositories, and APT uses that signatures to verify that all packages are from a trusted source.
The key used for verification is already installed if you install from our installation CD. If you install by other means, you can manually download the key with:
# wget http://download.proxmox.com/debian/proxmox-ve-release-5.x.gpg -O /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg
Please verify the checksum afterwards:
# sha512sum /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg ffb95f0f4be68d2e753c8875ea2f8465864a58431d5361e88789568673551501ae574283a4e0492f17d79dc67edfb173a56a6304dea39e01f249ebdabc9f074a /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg
# md5sum /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg 511d36d0f1350c01c42a3dc9f3c27939 /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg
System Software Updates
We provide regular package updates on all repositories. You can install those update using the GUI, or you can directly run the CLI command apt-get:
apt-get update apt-get dist-upgrade
|The apt package management system is extremely flexible and provides countless of feature - see man apt-get or [Hertzog13] for additional information.|
You should do such updates at regular intervals, or when we release versions with security related fixes. Major system upgrades are announced at the Proxmox VE Community Forum. Those announcement also contain detailed upgrade instructions.
|We recommend to run regular upgrades, because it is important to get the latest security updates.|
Network configuration can be done either via the GUI, or by manually editing the file /etc/network/interfaces, which contains the whole network configuration. The interfaces(5) manual page contains the complete format description. All Proxmox VE tools try hard to keep direct user modifications, but using the GUI is still preferable, because it protects you from errors.
Once the network is configured, you can use the Debian traditional tools ifup and ifdown commands to bring interfaces up and down.
|Proxmox VE does not write changes directly to /etc/network/interfaces. Instead, we write into a temporary file called /etc/network/interfaces.new, and commit those changes when you reboot the node.|
We currently use the following naming conventions for device names:
Ethernet devices: en*, systemd network interface names. This naming scheme is used for new Proxmox VE installations since version 5.0.
Ethernet devices: eth[N], where 0 ≤ N (eth0, eth1, …) This naming scheme is used for Proxmox VE hosts which were installed before the 5.0 release. When upgrading to 5.0, the names are kept as-is.
Bridge names: vmbr[N], where 0 ≤ N ≤ 4094 (vmbr0 - vmbr4094)
Bonds: bond[N], where 0 ≤ N (bond0, bond1, …)
VLANs: Simply add the VLAN number to the device name, separated by a period (eno1.50, bond1.30)
This makes it easier to debug networks problems, because the device name implies the device type.
Systemd Network Interface Names
Systemd uses the two character prefix en for Ethernet network devices. The next characters depends on the device driver and the fact which schema matches first.
o<index>[n<phys_port_name>|d<dev_port>] — devices on board
s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — device by hotplug id
[P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — devices by bus id
x<MAC> — device by MAC address
The most common patterns are:
eno1 — is the first on board NIC
enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.
For more information see Predictable Network Interface Names.
Choosing a network configuration
Depending on your current network organization and your resources you can choose either a bridged, routed, or masquerading networking setup.
Proxmox VE server in a private LAN, using an external gateway to reach the internet
The Bridged model makes the most sense in this case, and this is also the default mode on new Proxmox VE installations. Each of your Guest system will have a virtual interface attached to the Proxmox VE bridge. This is similar in effect to having the Guest network card directly connected to a new switch on your LAN, the Proxmox VE host playing the role of the switch.
Proxmox VE server at hosting provider, with public IP ranges for Guests
For this setup, you can use either a Bridged or Routed model, depending on what your provider allows.
Proxmox VE server at hosting provider, with a single public IP address
In that case the only way to get outgoing network accesses for your guest systems is to use Masquerading. For incoming network access to your guests, you will need to configure Port Forwarding.
For further flexibility, you can configure VLANs (IEEE 802.1q) and network bonding, also known as "link aggregation". That way it is possible to build complex and flexible virtual networks.
Default Configuration using a Bridge
Bridges are like physical network switches implemented in software. All VMs can share a single bridge, or you can create multiple bridges to separate network domains. Each host can have up to 4094 bridges.
The installation program creates a single bridge named vmbr0, which is connected to the first Ethernet card. The corresponding configuration in /etc/network/interfaces might look like this:
auto lo iface lo inet loopback iface eno1 inet manual auto vmbr0 iface vmbr0 inet static address 192.168.10.2 netmask 255.255.255.0 gateway 192.168.10.1 bridge_ports eno1 bridge_stp off bridge_fd 0
Virtual machines behave as if they were directly connected to the physical network. The network, in turn, sees each virtual machine as having its own MAC, even though there is only one network cable connecting all of these VMs to the network.
Most hosting providers do not support the above setup. For security reasons, they disable networking as soon as they detect multiple MAC addresses on a single interface.
|Some providers allows you to register additional MACs on their management interface. This avoids the problem, but is clumsy to configure because you need to register a MAC for each of your VMs.|
You can avoid the problem by “routing” all traffic via a single interface. This makes sure that all network packets use the same MAC address.
A common scenario is that you have a public IP (assume 198.51.100.5 for this example), and an additional IP block for your VMs (203.0.113.16/29). We recommend the following setup for such situations:
auto lo iface lo inet loopback auto eno1 iface eno1 inet static address 198.51.100.5 netmask 255.255.255.0 gateway 198.51.100.1 post-up echo 1 > /proc/sys/net/ipv4/ip_forward post-up echo 1 > /proc/sys/net/ipv4/conf/eno1/proxy_arp auto vmbr0 iface vmbr0 inet static address 203.0.113.17 netmask 255.255.255.248 bridge_ports none bridge_stp off bridge_fd 0
Masquerading (NAT) with iptables
Masquerading allows guests having only a private IP address to access the network by using the host IP address for outgoing traffic. Each outgoing packet is rewritten by iptables to appear as originating from the host, and responses are rewritten accordingly to be routed to the original sender.
auto lo iface lo inet loopback auto eno1 #real IP address iface eno1 inet static address 198.51.100.5 netmask 255.255.255.0 gateway 198.51.100.1 auto vmbr0 #private sub network iface vmbr0 inet static address 10.10.10.1 netmask 255.255.255.0 bridge_ports none bridge_stp off bridge_fd 0 post-up echo 1 > /proc/sys/net/ipv4/ip_forward post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
Bonding (also called NIC teaming or Link Aggregation) is a technique for binding multiple NIC’s to a single network device. It is possible to achieve different goals, like make the network fault-tolerant, increase the performance or both together.
High-speed hardware like Fibre Channel and the associated switching hardware can be quite expensive. By doing link aggregation, two NICs can appear as one logical interface, resulting in double speed. This is a native Linux kernel feature that is supported by most switches. If your nodes have multiple Ethernet ports, you can distribute your points of failure by running network cables to different switches and the bonded connection will failover to one cable or the other in case of network trouble.
Aggregated links can improve live-migration delays and improve the speed of replication of data between Proxmox VE Cluster nodes.
There are 7 modes for bonding:
Round-robin (balance-rr): Transmit network packets in sequential order from the first available network interface (NIC) slave through the last. This mode provides load balancing and fault tolerance.
Active-backup (active-backup): Only one NIC slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The single logical bonded interface’s MAC address is externally visible on only one NIC (port) to avoid distortion in the network switch. This mode provides fault tolerance.
XOR (balance-xor): Transmit network packets based on [(source MAC address XOR’d with destination MAC address) modulo NIC slave count]. This selects the same NIC slave for each destination MAC address. This mode provides load balancing and fault tolerance.
Broadcast (broadcast): Transmit network packets on all slave network interfaces. This mode provides fault tolerance.
IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP): Creates aggregation groups that share the same speed and duplex settings. Utilizes all slave network interfaces in the active aggregator group according to the 802.3ad specification.
Adaptive transmit load balancing (balance-tlb): Linux bonding driver mode that does not require any special network-switch support. The outgoing network packet traffic is distributed according to the current load (computed relative to the speed) on each network interface slave. Incoming traffic is received by one currently designated slave network interface. If this receiving slave fails, another slave takes over the MAC address of the failed receiving slave.
Adaptive load balancing (balance-alb): Includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special network switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the NIC slaves in the single logical bonded interface such that different network-peers use different MAC addresses for their network packet traffic.
If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using
the corresponding bonding mode (802.3ad). Otherwise you should generally use the
If you intend to run your cluster network on the bonding interfaces, then you have to use active-passive mode on the bonding interfaces, other modes are unsupported.
The following bond configuration can be used as distributed/shared storage network. The benefit would be that you get more speed and the network will be fault-tolerant.
auto lo iface lo inet loopback iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet static slaves eno1 eno2 address 192.168.1.2 netmask 255.255.255.0 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2+3 auto vmbr0 iface vmbr0 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 bridge_ports eno1 bridge_stp off bridge_fd 0
Another possibility it to use the bond directly as bridge port. This can be used to make the guest network fault-tolerant.
auto lo iface lo inet loopback iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual slaves eno1 eno2 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2+3 auto vmbr0 iface vmbr0 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 bridge_ports bond0 bridge_stp off bridge_fd 0
A virtual LAN (VLAN) is a broadcast domain that is partitioned and isolated in the network at layer two. So it is possible to have multiple networks (4096) in a physical network, each independent of the other ones.
Each VLAN network is identified by a number often called tag. Network packages are then tagged to identify which virtual network they belong to.
VLAN for Guest Networks
Proxmox VE supports this setup out of the box. You can specify the VLAN tag when you create a VM. The VLAN tag is part of the guest network confinuration. The networking layer supports differnet modes to implement VLANs, depending on the bridge configuration:
VLAN awareness on the Linux bridge: In this case, each guest’s virtual network card is assigned to a VLAN tag, which is transparently supported by the Linux bridge. Trunk mode is also possible, but that makes the configuration in the guest necessary.
"traditional" VLAN on the Linux bridge: In contrast to the VLAN awareness method, this method is not transparent and creates a VLAN device with associated bridge for each VLAN. That is, if e.g. in our default network, a guest VLAN 5 is used to create eno1.5 and vmbr0v5, which remains until rebooting.
Open vSwitch VLAN: This mode uses the OVS VLAN feature.
Guest configured VLAN: VLANs are assigned inside the guest. In this case, the setup is completely done inside the guest and can not be influenced from the outside. The benefit is that you can use more than one VLAN on a single virtual NIC.
VLAN on the Host
To allow host communication with an isolated network. It is possible to apply VLAN tags to any network device (NIC, Bond, Bridge). In general, you should configure the VLAN on the interface with the least abstraction layers between itself and the physical NIC.
For example, in a default configuration where you want to place the host management address on a separate VLAN.
auto lo iface lo inet loopback iface eno1 inet manual iface eno1.5 inet manual auto vmbr0v5 iface vmbr0v5 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 bridge_ports eno1.5 bridge_stp off bridge_fd 0 auto vmbr0 iface vmbr0 inet manual bridge_ports eno1 bridge_stp off bridge_fd 0
auto lo iface lo inet loopback iface eno1 inet manual auto vmbr0.5 iface vmbr0.5 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 auto vmbr0 iface vmbr0 inet manual bridge_ports eno1 bridge_stp off bridge_fd 0 bridge_vlan_aware yes
The next example is the same setup but a bond is used to make this network fail-safe.
auto lo iface lo inet loopback iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual slaves eno1 eno2 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2+3 iface bond0.5 inet manual auto vmbr0v5 iface vmbr0v5 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 bridge_ports bond0.5 bridge_stp off bridge_fd 0 auto vmbr0 iface vmbr0 inet manual bridge_ports bond0 bridge_stp off bridge_fd 0
The Proxmox VE cluster stack itself relies heavily on the fact that all the nodes have precisely synchronized time. Some other components, like Ceph, also refuse to work properly if the local time on nodes is not in sync.
Time synchronization between nodes can be achieved with the “Network Time Protocol” (NTP). Proxmox VE uses systemd-timesyncd as NTP client by default, preconfigured to use a set of public servers. This setup works out of the box in most cases.
Using Custom NTP Servers
In some cases, it might be desired to not use the default NTP servers. For example, if your Proxmox VE nodes do not have access to the public internet (e.g., because of restrictive firewall rules), you need to setup local NTP servers and tell systemd-timesyncd to use them:
[Time] NTP=ntp1.example.com ntp2.example.com ntp3.example.com ntp4.example.com
After restarting the synchronization service (systemctl restart systemd-timesyncd) you should verify that your newly configured NTP servers are used by checking the journal (journalctl --since -1h -u systemd-timesyncd):
... Oct 07 14:58:36 node1 systemd: Stopping Network Time Synchronization... Oct 07 14:58:36 node1 systemd: Starting Network Time Synchronization... Oct 07 14:58:36 node1 systemd: Started Network Time Synchronization. Oct 07 14:58:36 node1 systemd-timesyncd: Using NTP server 10.0.0.1:123 (ntp1.example.com). Oct 07 14:58:36 nora systemd-timesyncd: interval/delta/delay/jitter/drift 64s/-0.002s/0.020s/0.000s/-31ppm ...
External Metric Server
Starting with Proxmox VE 4.0, you can define external metric servers, which will be sent various stats about your hosts, virtual machines and storages.
Currently supported are:
graphite (see http://graphiteapp.org )
influxdb (see https://www.influxdata.com/time-series-platform/influxdb/ )
The server definitions are saved in /etc/pve/status.cfg
Graphite server configuration
The definition of a server is:
graphite: server your-server port your-port path your-path
where your-port defaults to 2003 and your-path defaults to proxmox
Proxmox VE sends the data over udp, so the graphite server has to be configured for this
Influxdb plugin configuration
The definition is:
influxdb: server your-server port your-port
Proxmox VE sends the data over udp, so the influxdb server has to be configured for this
Here is an example configuration for influxdb (on your influxdb server):
[[udp]] enabled = true bind-address = "0.0.0.0:8089" database = "proxmox" batch-size = 1000 batch-timeout = "1s"
With this configuration, your server listens on all IP addresses on port 8089, and writes the data in the proxmox database
Disk Health Monitoring
Although a robust and redundant storage is recommended, it can be very helpful to monitor the health of your local disks.
Starting with Proxmox VE 4.3, the package smartmontools
[smartmontools homepage https://www.smartmontools.org]
is installed and required. This is a set of tools to monitor and control the S.M.A.R.T. system for local hard disks.
You can get the status of a disk by issuing the following command:
# smartctl -a /dev/sdX
where /dev/sdX is the path to one of your local disks.
If the output says:
SMART support is: Disabled
you can enable it with the command:
# smartctl -s on /dev/sdX
For more information on how to use smartctl, please see man smartctl.
By default, smartmontools daemon smartd is active and enabled, and scans the disks under /dev/sdX and /dev/hdX every 30 minutes for errors and warnings, and sends an e-mail to root if it detects a problem.
For more information about how to configure smartd, please see man smartd and man smartd.conf.
If you use your hard disks with a hardware raid controller, there are most likely tools to monitor the disks in the raid array and the array itself. For more information about this, please refer to the vendor of your raid controller.
Logical Volume Manager (LVM)
Most people install Proxmox VE directly on a local disk. The Proxmox VE installation CD offers several options for local disk management, and the current default setup uses LVM. The installer let you select a single disk for such setup, and uses that disk as physical volume for the Volume Group (VG) pve. The following output is from a test installation using a small 8GB disk:
# pvs PV VG Fmt Attr PSize PFree /dev/sda3 pve lvm2 a-- 7.87g 876.00m # vgs VG #PV #LV #SN Attr VSize VFree pve 1 3 0 wz--n- 7.87g 876.00m
The installer allocates three Logical Volumes (LV) inside this VG:
# lvs LV VG Attr LSize Pool Origin Data% Meta% data pve twi-a-tz-- 4.38g 0.00 0.63 root pve -wi-ao---- 1.75g swap pve -wi-ao---- 896.00m
Formatted as ext4, and contains the operation system.
This volume uses LVM-thin, and is used to store VM images. LVM-thin is preferable for this task, because it offers efficient support for snapshots and clones.
For Proxmox VE versions up to 4.1, the installer creates a standard logical volume called “data”, which is mounted at /var/lib/vz.
Starting from version 4.2, the logical volume “data” is a LVM-thin pool, used to store block based guest images, and /var/lib/vz is simply a directory on the root file system.
We highly recommend to use a hardware RAID controller (with BBU) for such setups. This increases performance, provides redundancy, and make disk replacements easier (hot-pluggable).
LVM itself does not need any special hardware, and memory requirements are very low.
We install two boot loaders by default. The first partition contains the standard GRUB boot loader. The second partition is an EFI System Partition (ESP), which makes it possible to boot on EFI systems.
Creating a Volume Group
Let’s assume we have an empty disk /dev/sdb, onto which we want to create a volume group named “vmdata”.
|Please note that the following commands will destroy all existing data on /dev/sdb.|
First create a partition.
# sgdisk -N 1 /dev/sdb
Create a Physical Volume (PV) without confirmation and 250K metadatasize.
# pvcreate --metadatasize 250k -y -ff /dev/sdb1
Create a volume group named “vmdata” on /dev/sdb1
# vgcreate vmdata /dev/sdb1
Creating an extra LV for /var/lib/vz
This can be easily done by creating a new thin LV.
# lvcreate -n <Name> -V <Size[M,G,T]> <VG>/<LVThin_pool>
A real world example:
# lvcreate -n vz -V 10G pve/data
Now a filesystem must be created on the LV.
# mkfs.ext4 /dev/pve/vz
At last this has to be mounted.
|be sure that /var/lib/vz is empty. On a default installation it’s not.|
To make it always accessible add the following line in /etc/fstab.
# echo '/dev/pve/vz /var/lib/vz ext4 defaults 0 2' >> /etc/fstab
Resizing the thin pool
Resize the LV and the metadata pool can be achieved with the following command.
# lvresize --size +<size[\M,G,T]> --poolmetadatasize +<size[\M,G]> <VG>/<LVThin_pool>
|When extending the data pool, the metadata pool must also be extended.|
Create a LVM-thin pool
A thin pool has to be created on top of a volume group. How to create a volume group see Section LVM.
# lvcreate -L 80G -T -n vmstore vmdata
ZFS on Linux
ZFS is a combined file system and logical volume manager designed by Sun Microsystems. Starting with Proxmox VE 3.4, the native Linux kernel port of the ZFS file system is introduced as optional file system and also as an additional selection for the root file system. There is no need for manually compile ZFS modules - all packages are included.
By using ZFS, its possible to achieve maximum enterprise features with low budget hardware, but also high performance systems by leveraging SSD caching or even SSD only setups. ZFS can replace cost intense hardware raid cards by moderate CPU and memory load combined with easy management.
Easy configuration and management with Proxmox VE GUI and CLI.
Protection against data corruption
Data compression on file system level
Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
Can use SSD for cache
Continuous integrity checking
Designed for high storage capacities
Protection against data corruption
Asynchronous replication over network
ZFS depends heavily on memory, so you need at least 8GB to start. In practice, use as much you can get for your hardware/budget. To prevent data corruption, we recommend the use of high quality ECC RAM.
If you use a dedicated cache and/or log disk, you should use an enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can increase the overall performance significantly.
|Do not use ZFS on top of hardware controller which has its own cache management. ZFS needs to directly communicate with disks. An HBA adapter is the way to go, or something like LSI controller flashed in “IT” mode.|
If you are experimenting with an installation of Proxmox VE inside a VM (Nested Virtualization), don’t use virtio for disks of that VM, since they are not supported by ZFS. Use IDE or SCSI instead (works also with virtio SCSI controller type).
Installation as Root File System
When you install using the Proxmox VE installer, you can choose ZFS for the root file system. You need to select the RAID type at installation time:
Also called “striping”. The capacity of such volume is the sum of the capacities of all disks. But RAID0 does not add any redundancy, so the failure of a single drive makes the volume unusable.
Also called “mirroring”. Data is written identically to all disks. This mode requires at least 2 disks with the same size. The resulting capacity is that of a single disk.
A combination of RAID0 and RAID1. Requires at least 4 disks.
A variation on RAID-5, single parity. Requires at least 3 disks.
A variation on RAID-5, double parity. Requires at least 4 disks.
A variation on RAID-5, triple parity. Requires at least 5 disks.
The installer automatically partitions the disks, creates a ZFS pool called rpool, and installs the root file system on the ZFS subvolume rpool/ROOT/pve-1.
Another subvolume called rpool/data is created to store VM images. In order to use that with the Proxmox VE tools, the installer creates the following configuration entry in /etc/pve/storage.cfg:
zfspool: local-zfs pool rpool/data sparse content images,rootdir
After installation, you can view your ZFS pool status using the zpool command:
# zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sda2 ONLINE 0 0 0 sdb2 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 errors: No known data errors
The zfs command is used configure and manage your ZFS file systems. The following command lists all file systems after installation:
# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 4.94G 7.68T 96K /rpool rpool/ROOT 702M 7.68T 96K /rpool/ROOT rpool/ROOT/pve-1 702M 7.68T 702M / rpool/data 96K 7.68T 96K /rpool/data rpool/swap 4.25G 7.69T 64K -
The default ZFS disk partitioning scheme does not use the first 2048 sectors. This gives enough room to install a GRUB boot partition. The Proxmox VE installer automatically allocates that space, and installs the GRUB boot loader there. If you use a redundant RAID setup, it installs the boot loader on all disk required for booting. So you can boot even if some disks fail.
|It is not possible to use ZFS as root file system with UEFI boot.|
This section gives you some usage examples for common tasks. ZFS itself is really powerful and provides many options. The main commands to manage ZFS are zfs and zpool. Both commands come with great manual pages, which can be read with:
# man zpool # man zfs
To create a new pool, at least one disk is needed. The ashift should have the same sector-size (2 power of ashift) or larger as the underlying disk.
zpool create -f -o ashift=12 <pool> <device>
To activate compression
zfs set compression=lz4 <pool>
Minimum 1 Disk
zpool create -f -o ashift=12 <pool> <device1> <device2>
Minimum 2 Disks
zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
Minimum 4 Disks
zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4>
Minimum 3 Disks
zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
Minimum 4 Disks
zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
It is possible to use a dedicated cache drive partition to increase the performance (use SSD).
As <device> it is possible to use more devices, like it’s shown in "Create a new pool with RAID*".
zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
It is possible to use a dedicated cache drive partition to increase the performance(SSD).
As <device> it is possible to use more devices, like it’s shown in "Create a new pool with RAID*".
zpool create -f -o ashift=12 <pool> <device> log <log_device>
If you have an pool without cache and log. First partition the SSD in 2 partition with parted or gdisk
|Always use GPT partition tables.|
The maximum size of a log device should be about half the size of physical memory, so this is usually quite small. The rest of the SSD can be used as cache.
zpool add -f <pool> log <device-part1> cache <device-part2>
zpool replace -f <pool> <old device> <new-device>
Activate E-Mail Notification
ZFS comes with an event daemon, which monitors events generated by the ZFS kernel module. The daemon can also send emails on ZFS events like pool errors. Newer ZFS packages ships the daemon in a separate package, and you can install it using apt-get:
# apt-get install zfs-zed
To activate the daemon it is necessary to edit /etc/zfs/zed.d/zed.rc with your favourite editor, and uncomment the ZED_EMAIL_ADDR setting:
Please note Proxmox VE forwards mails to root to the email address configured for the root user.
|The only setting that is required is ZED_EMAIL_ADDR. All other settings are optional.|
Limit ZFS Memory Usage
It is good to use at most 50 percent (which is the default) of the system memory for ZFS ARC to prevent performance shortage of the host. Use your preferred editor to change the configuration in /etc/modprobe.d/zfs.conf and insert:
options zfs zfs_arc_max=8589934592
This example setting limits the usage to 8GB.
If your root file system is ZFS you must update your initramfs every time this value changes:
Swap-space created on a zvol may generate some troubles, like blocking the server or generating a high IO load, often seen when starting a Backup to an external Storage.
We strongly recommend to use enough memory, so that you normally do not run into low memory situations. Should you need or want to add swap, it is preferred to create a partition on a physical disk and use it as swapdevice. You can leave some space free for this purpose in the advanced options of the installer. Additionally, you can lower the “swappiness” value. A good value for servers is 10:
sysctl -w vm.swappiness=10
To make the swappiness persistent, open /etc/sysctl.conf with an editor of your choice and add the following line:
vm.swappiness = 10
vm.swappiness = 0
The kernel will swap only to avoid an out of memory condition
vm.swappiness = 1
Minimum amount of swapping without disabling it entirely.
vm.swappiness = 10
This value is sometimes recommended to improve performance when sufficient memory exists in a system.
vm.swappiness = 60
The default value.
vm.swappiness = 100
The kernel will swap aggressively.
Certificates for communication within the cluster
Each Proxmox VE cluster creates its own (self-signed) Certificate Authority (CA) and generates a certificate for each node which gets signed by the aforementioned CA. These certificates are used for encrypted communication with the cluster’s pveproxy service and the Shell/Console feature if SPICE is used.
The CA certificate and key are stored in the Proxmox Cluster File System (pmxcfs).
Certificates for API and web GUI
The REST API and web GUI are provided by the pveproxy service, which runs on each node.
You have the following options for the certificate used by pveproxy:
By default the node-specific certificate in /etc/pve/nodes/NODENAME/pve-ssl.pem is used. This certificate is signed by the cluster CA and therefore not trusted by browsers and operating systems by default.
use an externally provided certificate (e.g. signed by a commercial CA).
use ACME (e.g., Let’s Encrypt) to get a trusted certificate with automatic renewal.
For options 2 and 3 the file /etc/pve/local/pveproxy-ssl.pem (and /etc/pve/local/pveproxy-ssl.key, which needs to be without password) is used.
Certificates are managed with the Proxmox VE Node management command (see the pvenode(1) manpage).
|Do not replace or manually modify the automatically generated node certificate files in /etc/pve/local/pve-ssl.pem and /etc/pve/local/pve-ssl.key or the cluster CA files in /etc/pve/pve-root-ca.pem and /etc/pve/priv/pve-root-ca.key.|
Getting trusted certificates via ACME
Proxmox VE includes an implementation of the Automatic Certificate Management Environment ACME protocol, allowing Proxmox VE admins to interface with Let’s Encrypt for easy setup of trusted TLS certificates which are accepted out of the box on most modern operating systems and browsers.
Currently the two ACME endpoints implemented are Let’s Encrypt (LE) and its staging environment (see https://letsencrypt.org), both using the standalone HTTP challenge.
Because of rate-limits you should use LE staging for experiments.
There are a few prerequisites to use Let’s Encrypt:
Port 80 of the node needs to be reachable from the internet.
There must be no other listener on port 80.
The requested (sub)domain needs to resolve to a public IP of the Node.
You have to accept the ToS of Let’s Encrypt.
At the moment the GUI uses only the default ACME account.
root@proxmox:~# pvenode acme account register default email@example.com Directory endpoints: 0) Let's Encrypt V2 (https://acme-v02.api.letsencrypt.org/directory) 1) Let's Encrypt V2 Staging (https://acme-staging-v02.api.letsencrypt.org/directory) 2) Custom Enter selection: 1 Attempting to fetch Terms of Service from 'https://acme-staging-v02.api.letsencrypt.org/directory'.. Terms of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf Do you agree to the above terms? [y|N]y Attempting to register account with 'https://acme-staging-v02.api.letsencrypt.org/directory'.. Generating ACME account key.. Registering ACME account.. Registration successful, account URL: 'https://acme-staging-v02.api.letsencrypt.org/acme/acct/xxxxxxx' Task OK root@proxmox:~# pvenode acme account list default root@proxmox:~# pvenode config set --acme domains=example.invalid root@proxmox:~# pvenode acme cert order Loading ACME account details Placing ACME order Order URL: https://acme-staging-v02.api.letsencrypt.org/acme/order/xxxxxxxxxxxxxx Getting authorization details from 'https://acme-staging-v02.api.letsencrypt.org/acme/authz/xxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxx-xxxxxxx' ... pending! Setting up webserver Triggering validation Sleeping for 5 seconds Status is 'valid'! All domains validated! Creating CSR Finalizing order Checking order status valid! Downloading certificate Setting pveproxy certificate and key Restarting pveproxy Task OK
Switching from the staging to the regular ACME directory
Changing the ACME directory for an account is unsupported. If you want to switch an account from the staging ACME directory to the regular, trusted, one you need to deactivate it and recreate it.
This procedure is also needed to change the default ACME account used in the GUI.
root@proxmox:~# pvenode acme account info default Directory URL: https://acme-staging-v02.api.letsencrypt.org/directory Account URL: https://acme-staging-v02.api.letsencrypt.org/acme/acct/6332194 Terms Of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf Account information: ID: xxxxxxx Contact: - mailto:firstname.lastname@example.org Creation date: 2018-07-31T08:41:44.54196435Z Initial IP: 192.0.2.1 Status: valid root@proxmox:~# pvenode acme account deactivate default Renaming account file from '/etc/pve/priv/acme/default' to '/etc/pve/priv/acme/_deactivated_default_4' Task OK root@proxmox:~# pvenode acme account register default email@example.com Directory endpoints: 0) Let's Encrypt V2 (https://acme-v02.api.letsencrypt.org/directory) 1) Let's Encrypt V2 Staging (https://acme-staging-v02.api.letsencrypt.org/directory) 2) Custom Enter selection: 0 Attempting to fetch Terms of Service from 'https://acme-v02.api.letsencrypt.org/directory'.. Terms of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf Do you agree to the above terms? [y|N]y Attempting to register account with 'https://acme-v02.api.letsencrypt.org/directory'.. Generating ACME account key.. Registering ACME account.. Registration successful, account URL: 'https://acme-v02.api.letsencrypt.org/acme/acct/39335247' Task OK
Automatic renewal of ACME certificates
If a node has been successfully configured with an ACME-provided certificate (either via pvenode or via the GUI), the certificate will be automatically renewed by the pve-daily-update.service. Currently, renewal will be attempted if the certificate has expired or will expire in the next 30 days.