Configuring Deduplication Data Stores in Arcserve UDP

Data Deduplication › Configuring Deduplication Data Stores in Arcserve UDP

Configuring Deduplication Data Stores in Arcserve UDP

The following are the important parameters to configure for a deduplication data store:

Data destination

Data destination is used to store the protected data. It is better to use larger disk for the data destination because it contains the original data blocks of the source.

Index Destination

Index destination is used to store the index files and it is better to use a different disk to improve the deduplication processing throughput.

Hash destination

Hash destination is used to store the hash files and it is better to use to high speed SSD drive which can improve the deduplication capacity with a low memory allocation required.

If hash destination is configured on a high speed SSD, it could be used to enlarge deduplication capacity with low memory allocation requirement.

Backup destination folder

The destination folder where .D2D files and catalog files reside.

Block size

The “deduplication block size” also impacts the “deduplication capacity estimation”. The default “deduplication block size” is 16 KB. If you set it to 32 KB, then the “deduplication capacity estimation” is doubled. The impact of increasing the deduplication block size is that it can decrease the deduplication percentage and at the same time the memory requirement decreases.

Memory Allocation

To estimate the memory requirement, use the “Estimate Memory and Storage Requirements” tool. If the Memory Allocated is not enough and when the memory is fully used, the new data cannot insert new hash into hash DB. So, any data that are backed up after that cannot be Deduplicated, causing the Dedupe ratio to go down. If you cannot increase the memory for some reason, then try increasing the deduplication block size as it would decrease the memory requirement.

Note: Block Size cannot be changed for an existing data store.

Be aware that a new backup job is not allowed to launch once hash memory is full. But for the ongoing backup job (which was launched before the hash memory is full), it is allowed to continue and get completed. In this case, it would not insert new hash keys to hash database. As a result, impacting the dedupe percentage.

The reason is that all data blocks in the ongoing backup job are still compared with the existing hash keys in the hash database,

• If it is duplicated with the existing hash key, it is not written to the disk any more.

• If it is not duplicated with the existing hash key, it is written to disk. But the new hash key would not be inserted into hash database because hash database is full. As a result, the consequent data blocks could not compare against these new hash keys.