Backup and Recover VM Operations Fail Using NBD Transport Mode
Valid on all Windows platforms running on backup proxy systems.
Symptom
Backup and recover VM operations fail.
The following errors appear in the VDDK error logs:
Failed to open NBD extent
NBD_ERR_GENERIC
NFC connection errors relating to NFC operations appear in the error logs. For example:
NfcFssrvrRecv
NfcFssrvr_DiskOpen
NfcNetTcpWriteNfcNet_Send
NfcSendMessage
Note: Debugging must be enabled to view the above error logs. For more information, see Enable Debugging for VDDK Jobs.
Solution
Network Block Device (NBD) transport mode, also referred to as LAN transport mode, uses the Network File Copy (NFC) protocol to communicate. Various VDDK operations use one connection for each virtual disk that it accesses on each ESX Server and ESXi Server host when using NBD. Furthermore, connections cannot be shared across disks. The VI Client and periodic communication between the host systems, the vpxd, the ESX Server, and ESXi Server systems account for the number of concurrent connections.
The following table describes the maximum number NFC connections:
|
Host Platform |
Connecting to |
Limits to |
|
vSphere 5 and 6 |
an ESXi host |
Limited by a transfer buffer for all NFC connections enforced by the host: the sum of all NFC connection buffers to an ESXi host cannot exceed 32MB. 52 connections through vCenter server, including the above per-host limit. |
Be aware if the following:
- The Maximum Connections values represent host limits.
- The Maximum Connections values do not represent process limits.
- The Maximum Connections values do not apply to SAN and hotadd connections.
- The error messages described under Symptoms occur when the number of NFC connections to the host systems exceed the maximum number of connections described in the above table. When failures occur, the number of connections to the ESX Server or ESXi Server increase, which causes the communication sessions to the host systems to exceed the number of maximum connections.
- If the NFC client does not shut down properly, ESX Server and ESXi Server allow the communication sessions to remain open for an additional ten minutes. This behavior can increase the number of open connections.
Best Practices:
The solution to this problem is to use the following best practices to help ensure that backup and recovery operations do not fail when using NBD transport protocol:
- Ensure that open connections to ESX Server systems and ESXi Server systems are closed properly.
- Use the following best practices when submitting backup and restore jobs:
- If you suspect that you will need a high number of connections to the host systems, you should populate the VMs in your Arcserve Backup environment using VMware vCenter Server.
- When backing up data using the VDDK approach, you should optimize the number of streams specified for multistreaming backups and optimize the number of concurrent read operations of the VM disks. This approach helps to minimize the number of communication sessions to the host system. You can estimate the number of connections using the following calculations:
- Mixed Mode backups and Raw (full VM) backups (with or without the Allow file level restore option specified) using VDDK--The number of connections equals the lesser of the number of streams in a multistreaming job or the number of VMs specified in a multistreaming job, multiplied times the value of vmdkReaderCount.
- Raw (full VM) backups (with or without the Allow file level restore option specified) and File mode backups using VDDK--The number of connections equals the total number of disks for all VMs backed up concurrently, limited by the number of streams specified for a multiplexing job.
Note: For backups of VMs that use VDDK, Arcserve Backup backs up one disk at a time, and there are multiple connections to each disk as indicated by the value of vmdkReaderCount.
Example: A job consists of 4 VMs. VM1 contains 5 disks. VM2, VM3, and VM4 contain 4 disks each. There are 3 streams specified for the job.
The number of connections equals 3 (the number of streams is less than number of VMs) multiplied times 4 (the value of vmdkReaderCount).
The number of connections required is 12.
Note: By default, VDDK backups use a vmdkReaderCount value of 4. For information about how to change the value of VDDK vmdkReaderCount, see Configure the Number of Concurrent Read Operations Using VDDK.
Example: A job consists of 4 VMs. VM1 contains 5 disks. VM2, VM3, and VM4 contain 4 disks each. There are 3 streams specified for the job.
The number of connections equals 5 (VM1) plus 4 (VM2) plus 5 (VM3).
The number of connections required is 14. Arcserve Backup will back up VM4 when the backup pertaining to VM1, VM2, or VM3 is complete.