Data deduplication on Windows Server 2022

Double Trouble

Data Deduplication for Volumes

After installing the server role for data deduplication and testing the individual drives, the next step is to enable the feature for the target drives on the target server. To do this, you can either use the Server Manager and enable the function by selecting File and Storage Services | Volumes followed by Configure Data Deduplication in the target volume's context menu, or you can use PowerShell if you prefer. I will be looking at both options later. Selecting the option in Server Manager pops up a window where you can configure all the settings required for the target volume.

Start by selecting the server type and the data to be deduplicated. General purpose file server , Virtual Desktop Infrastructure (VDI) server , and Virtualized Backup Server are available as options. Specify the number of days to wait before deduplicating duplicate files (Figure 3). A period of three days is preconfigured by default. You can also exclude individual file types, individual files, or entire folders from deduplication.

Figure 3: Setting up data deduplication for individual drives in Server Manager.

Click the Set Deduplication Schedule button to set up in detail when you want the background service to clean up the server. You will generally want to check the Enable background optimization option, which means that the deduplication service will run in the background and generate as little load as possible on the server. Windows can even stop the service if required. In the window, you can also define two additional schedules for days on which deduplication will run with normal priority at specific times. Of course, you will want to select times when the server is not very busy. As a general rule, you should avoid other activities taking place on the server at the same time as deduplication, including maintenance, data backups, and malware scans.

Deduplication of VDI Servers

Data deduplication in VDI environments offers substantial benefits, but some key aspects differ from deduplication on conventional file servers. VDI scenarios often have many desktop instances with similar or identical data, and deduplication can achieve significant storage space savings by eliminating redundant data across multiple virtual desktops. This process does not just reduce the storage capacity you need, it also improves performance, because less physical storage space is required to store and read the data.

One significant difference from deduplication on file servers relates to the type of data stored. Whereas file servers usually store a variety of file types and data structures, the files in a VDI environment are often more homogeneous, because many virtual machines use similar operating systems and applications. This homogeneity increases the potential for deduplication because more redundant data exists.

Additionally, deduplication in VDI environments often requires a customized configuration to meet the specific requirements of these environments. For example, it can be important to configure deduplication such that it does not affect performance at peak times; after all, response times and availability are critical factors in VDI environments.

Another difference lies in maintenance and administration. VDI environments can be dynamic by nature, with frequent changes to virtual desktops, requiring regular reviews and adjustments of the deduplication settings. In contrast, the content on the file servers is often more static, which means the deduplication settings do not need to be modified as frequently.

PowerShell

To use PowerShell to control data deduplication on Windows Server 2022, the following commands enable data deduplication for a target volume and configure the settings:

Enable-DedupVolume -Volume F:
Enable-DedupVolume -Volume d: -UsageType Default

You can also manage this process with General purpose file server in Server Manager, and you can immediately start deduplication with the command

Start-DedupJob -Volume <drive letter> -Type Optimization

Set-DedupSchedule modifies the configuration of the deduplication parameters, such as the schedule for garbage collection and optimization:

Set-DedupSchedule -Name "DailyOptimization" -Type Optimization -Start 01:00 -DurationHours 3

You can use the following PowerShell command to discover the scheduled tasks:

Get-ScheduledTask -TaskPath \Microsoft\Windows\Deduplication\

Get-DedupStatus lets you monitor the deduplication rate and savings achieved, whereas

Start-DedupJob -Volume "D:" -Type Scrubbing

checks the integrity of the deduplicated data. These commands give you comprehensive options for controlling and monitoring data deduplication without Server Manager. If you want to wait for the deduplication response, type

Start-DedupJob <drive letter> -Type Optimization -Wait

You can also display the current status of a job and retrieve further information by typing

Get-DedupJob
Get-DedupVolume

For more detailed information, you can redirect the output to the Format-List cmdlet (e.g., Get-DedupVolume | fl). Careful monitoring of deduplication success is also important. You can create reports with commands such as

Get-DedupVolume -Volume "D:" | Select-Object SavingsRate,OptimizedFilesCount

to output deduplication success metrics and make adjustments, if necessary. PowerShell also lets you configure the various deduplication options. For example, you can adjust the minimum file size for deduplication to improve efficiency:

Set-DedupVolume -Volume "D:" -MinimumFileSize 128KB

You can also disable additional compression with the NoCompress parameter if the data is already compressed. Certain file types can be excluded from deduplication to optimize performance for these files, which is not only possible in Server Manager, but also in PowerShell:

Set-DedupVolume -Volume "D:" -ExcludeFileType "log","tmp"

If you want to disable data deduplication for a drive again, you can use Server Manager from the same window as for enabling deduplication. To do so, set the Disabled option in Configure Data Deduplication . If you want to use PowerShell instead, run the command

Disable-DedupVolume -Volume F:

In some circumstances, you might need to restore deduplicated volumes, for which you can use

Start-DedupJob -Volume "D:" -Type Unoptimization

This kind of flexibility is particularly useful in complex IT environments. Windows Server 2022 offers special optimization options for custom applications such as VDI environments. Adjusting the settings with commands such as

Set-DedupVolume -Volume "D:" -OptimizeInUseFiles -OptimizePartialFiles

maximizes deduplication performance in these environments. It is also advisable to carry out regular checks and maintenance to ensure that your data deduplication setup is running efficiently and without interruptions. When planning deduplication tasks, you also need to take the server load into account. Scheduling deduplication tasks outside of peak times helps you minimize the server load and optimize the overall performance.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus