« Previous 1 2 3 4 Next »
Data deduplication on Windows Server 2022
Double Trouble
Data Deduplication for Volumes
After installing the server role for data deduplication and testing the individual drives, the next step is to enable the feature for the target drives on the target server. To do this, you can either use the Server Manager and enable the function by selecting File and Storage Services | Volumes followed by Configure Data Deduplication in the target volume's context menu, or you can use PowerShell if you prefer. I will be looking at both options later. Selecting the option in Server Manager pops up a window where you can configure all the settings required for the target volume.
Start by selecting the server type and the data to be deduplicated. General purpose file server , Virtual Desktop Infrastructure (VDI) server , and Virtualized Backup Server are available as options. Specify the number of days to wait before deduplicating duplicate files (Figure 3). A period of three days is preconfigured by default. You can also exclude individual file types, individual files, or entire folders from deduplication.
Click the Set Deduplication Schedule button to set up in detail when you want the background service to clean up the server. You will generally want to check the Enable background optimization option, which means that the deduplication service will run in the background and generate as little load as possible on the server. Windows can even stop the service if required. In the window, you can also define two additional schedules for days on which deduplication will run with normal priority at specific times. Of course, you will want to select times when the server is not very busy. As a general rule, you should avoid other activities taking place on the server at the same time as deduplication, including maintenance, data backups, and malware scans.
Deduplication of VDI Servers
Data deduplication in VDI environments offers substantial benefits, but some key aspects differ from deduplication on conventional file servers. VDI scenarios often have many desktop instances with similar or identical data, and deduplication can achieve significant storage space savings by eliminating redundant data across multiple virtual desktops. This process does not just reduce the storage capacity you need, it also improves performance, because less physical storage space is required to store and read the data.
One significant difference from deduplication on file servers relates to the type of data stored. Whereas file servers usually store a variety of file types and data structures, the files in a VDI environment are often more homogeneous, because many virtual machines use similar operating systems and applications. This homogeneity increases the potential for deduplication because more redundant data exists.
Additionally, deduplication in VDI environments often requires a customized configuration to meet the specific requirements of these environments. For example, it can be important to configure deduplication such that it does not affect performance at peak times; after all, response times and availability are critical factors in VDI environments.
Another difference lies in maintenance and administration. VDI environments can be dynamic by nature, with frequent changes to virtual desktops, requiring regular reviews and adjustments of the deduplication settings. In contrast, the content on the file servers is often more static, which means the deduplication settings do not need to be modified as frequently.
PowerShell
To use PowerShell to control data deduplication on Windows Server 2022, the following commands enable data deduplication for a target volume and configure the settings:
Enable-DedupVolume -Volume F: Enable-DedupVolume -Volume d: -UsageType Default
You can also manage this process with General purpose file server in Server Manager, and you can immediately start deduplication with the command
Start-DedupJob -Volume <drive letter> -Type Optimization
Set-DedupSchedule
modifies the configuration of the deduplication parameters, such as the schedule for garbage collection and optimization:
Set-DedupSchedule -Name "DailyOptimization" -Type Optimization -Start 01:00 -DurationHours 3
You can use the following PowerShell command to discover the scheduled tasks:
Get-ScheduledTask -TaskPath \Microsoft\Windows\Deduplication\
Get-DedupStatus
lets you monitor the deduplication rate and savings achieved, whereas
Start-DedupJob -Volume "D:" -Type Scrubbing
checks the integrity of the deduplicated data. These commands give you comprehensive options for controlling and monitoring data deduplication without Server Manager. If you want to wait for the deduplication response, type
Start-DedupJob <drive letter> -Type Optimization -Wait
You can also display the current status of a job and retrieve further information by typing
Get-DedupJob Get-DedupVolume
For more detailed information, you can redirect the output to the Format-List
cmdlet (e.g., Get-DedupVolume | fl
). Careful monitoring of deduplication success is also important. You can create reports with commands such as
Get-DedupVolume -Volume "D:" | Select-Object SavingsRate,OptimizedFilesCount
to output deduplication success metrics and make adjustments, if necessary. PowerShell also lets you configure the various deduplication options. For example, you can adjust the minimum file size for deduplication to improve efficiency:
Set-DedupVolume -Volume "D:" -MinimumFileSize 128KB
You can also disable additional compression with the NoCompress
parameter if the data is already compressed. Certain file types can be excluded from deduplication to optimize performance for these files, which is not only possible in Server Manager, but also in PowerShell:
Set-DedupVolume -Volume "D:" -ExcludeFileType "log","tmp"
If you want to disable data deduplication for a drive again, you can use Server Manager from the same window as for enabling deduplication. To do so, set the Disabled option in Configure Data Deduplication . If you want to use PowerShell instead, run the command
Disable-DedupVolume -Volume F:
In some circumstances, you might need to restore deduplicated volumes, for which you can use
Start-DedupJob -Volume "D:" -Type Unoptimization
This kind of flexibility is particularly useful in complex IT environments. Windows Server 2022 offers special optimization options for custom applications such as VDI environments. Adjusting the settings with commands such as
Set-DedupVolume -Volume "D:" -OptimizeInUseFiles -OptimizePartialFiles
maximizes deduplication performance in these environments. It is also advisable to carry out regular checks and maintenance to ensure that your data deduplication setup is running efficiently and without interruptions. When planning deduplication tasks, you also need to take the server load into account. Scheduling deduplication tasks outside of peak times helps you minimize the server load and optimize the overall performance.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.