Data deduplication on Windows Server 2022

Double Trouble

Deduplication Process

Technically speaking, the deduplication function analyzes the data blocks on a volume and searches for duplicates. As soon as identical data blocks are found, the system only keeps one copy and creates links to this block for each instance of its use. This process is executed by a background service that runs regularly to check for new and modified files.

Deduplication on Windows Server 2022 uses a postprocess approach (i.e., the data is first saved in its original form and then retroactively deduplicated). This approach minimizes the effect on system performance during primary storage operations. For efficient data processing, deduplication relies on a chunking algorithm that breaks the data down into smaller units and then analyzes them individually.

Data integrity is a key aspect of deduplication. Windows Server 2022 uses various mechanisms, including checksums and integrity checks, to ensure that the deduplicated data is not corrupted. Deduplication relies on metadata to manage the original data and the deduplicated copies, requiring additional care for backup and restore operations because the metadata is key to reconstructing the original data correctly.

Installation Two Ways

Data deduplication can be integrated with Server Manager by installing the Data Deduplication server role under File and Storage Services | File and iSCSI Services (Figure 1). Alternatively, you can run the following command in PowerShell:

Install-WindowsFeature -Name FS-Data-Deduplication
Figure 1: Setting up data deduplication in Server Manager and with PowerShell on Windows Server 2019/2022.

Installing deduplication does not start the process; it simply imports the required system files. You need to complete the configuration in Server Manager or PowerShell.

Testing Volumes

In the course of installing the server roles for data deduplication, the installation wizard also integrates the ddpeval.exe command-line tool. You can use it at the command line to search for duplicate files (Figure 2). Doing so will tell you whether the server role can be meaningfully applied to the individual data carriers on the server. You cannot enable data deduplication on boot drives or use ddpeval to check whether data deduplication on boot drives makes sense.

Figure 2: Use the command line to discover whether data deduplication makes sense for individual drives.

The tool resides in the \Windows\System32 directory and supports both local drives and network shares. The syntax of the tool is ddpeval <Volume:>, as in:

ddpeval E:\
ddpeval \\nas\data

The ddpeval tool itself does not clean up the files; it simply tells you whether or not data deduplication makes sense for the drive in question and offers a preview of possible savings through data deduplication without modifying the data. For a more targeted analysis of a specific directory, you need to modify the command as follows:

ddpeval.exe D:\Data\Projects

The output from ddpeval contains details of the total size of the analyzed data, the estimated size after deduplication, and the savings as a percentage. This information is crucial for making an informed decision on implementing data deduplication. In particular, the tool helps you evaluate the potential benefits of deduplication and decide which volumes or directories are best suited for deduplication. The following command lets you save the results:

DDPEval.exe d:\wsus /v /o:C:\temp\dedup.txt

This syntax gives you a comprehensive report on the potential storage space savings that you can achieve by introducing data deduplication.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus