Arkeia Software

Progressive Deduplication

Arkeia acquired its Progressive Deduplication™ technology when Arkeia purchased Kadena Systems in November 2009. Arkeia Software delivers deduplication functionality that is block-grain, source-side, in-line, and content-aware. Progressive Deduplication technology is distinct from fixed-block or variable-block technology.

Dedupe Technologies Comparison

Fixed-block “block-grain” deduplication (where a block or sub-unit of a file is deduplicated) is an improvement over “file-grain” deduplication (where an entire file is found to be redundant). However, fixed-block deduplication fails to tolerate the insertion of data at the beginning or in the middle of a file. When data is inserted in a file, fixed-block deduplication will see all subsequent blocks as new blocks, resulting in a lower deduplication compression ratio.

Variable-block deduplication addresses
the problem of data inserts, but at the cost of additional processing. Variable-block dedupe sets block boundaries by identifying markers (so-called “magic numbers”) within the file’s data. While the compression ratio improves, performance slows.

Kadena’s Progressive Deduplication offers the performance benefits of fixed-block deduplication and tolerance of data inserts offered by variable-block. The “sliding window” used by Progressive Deduplication has been used in many compression algorithms. Kadena’s key innovation is a strategy called “progressive matching,” described below.

Faster, for Shorter Backup Windows

Progressive-matching algorithm

Progressive Deduplication is fast, reducing the length of backup windows. Arkeia’s Progressive Deduplication eliminates variable-block’s need to scan for block boundaries. First, all files previously encountered by Arkeia are deduplicated at fixed-block speeds. Second, new data is surveyed with a sliding window. A speedy, light-weight algorithm determines if data under the window is a probable match to blocks in the known-block-pool.

Probable matches are scrutinized with a heavy-weight hash algorithm. Because over 99% of probable matches prove to be exact matches, progressive matching is extremely efficient. Arkeia’s patented “progressive matching” technology inspired the name “Progressive Deduplication.”

Higher Compression Ratios, for Reduced Storage and Network Traffic

Progressive Deduplication delivers high compression ratios which save money by reducing storage volume and network performance requirements. Moving less data over the network also accelerates backups.

Variable-block deduplication results are very sensitive to the placement of block boundaries. Progressive Deduplication evaluates all possible block boundaries, ensuring the best possible deduplication for any block size.

Data Deduplication Compression Ratios

Further, because the size of the sliding window is adjustable, block sizes can be tailored to file types. This permits Progressive Deduplication to be “content aware” (also known as “application aware”). To achieve maximum compression rates Arkeia uses different block sizes with different types of data—such as executable files, text files, and database records. Arkeia has analyzed hundreds of file types, produced by hundreds of popular applications, to determine each one’s optimal block size. Systems administrators can override default block sizes and can specify block sizes for new file types.

Dedupe ratios are highly data-dependent, but can attain a 95% reduction in data volume when the same files are repeatedly backed up (e.g. nightly backups for a month) or many similar volumes are backed up (e.g. dozens of VMware virtual machines across multiple physical hosts).

Replication of Deduplicated Data

By combining deduplication with Arkeia's backup replication technology, Arkeia customers will be better equipped to protect distributed environments using WAN connections. Data is replicated to a remote disaster recovery site in deduplicated form, minimizing the network, bandwidth needed for the transfer. Because only new blocks not yet known to the disaster recovery site are sent over the WAN, backups complete in minimum time. Backup environments that previously required tapes for off-site data protection can now enjoy the efficiency and cost savings of WAN-transfers.

