Next: Deflation/Inflation Coder ( DeflateCoder Up: Lossless Compression Previous: Static Arithmetic Coder (   Contents   Index

# Prediction by Partial Matching ( PPMIICoder )

Definition

The PPMIICoder is based on the compression scheme Prediction by Partial Matching with Information Inheritance'' by D. Shkarin [81].
This coder works as follows: Suppose we have processed the first n - 1 symbols x1...xn-1 of the stream. Before reading the next symbol xn we try to guess it, i.e. for every symbol s we estimate the probability p(s) for the event xn = s ''. This probability distribution determines how the next symbol is encoded: The higher p(s) , the fewer bits are used for encoding s . If our estimation is good, which means that p(xn) is high, then we obtain a good compression rate.
In order to predict the probality distribution for the n th symbol the PPM approach considers the preceding k symbols xn-k...xn-1 . We call these symbols the context of xn and k the order of the model. (For k = 0 we obtain the order-0 model from the previous section.) E.g., if the current context is req'', then we should predict the letter u'' as next symbol with high probability.
PPMII is a variant of PPM which usually achieves very accurate estimations.

The PPMIICoder combines very good compression rates with acceptable speed. (Shkarin [81] reports that his coder outperforms ZIP and BZIP2 with respect to compression rates and speed.) The only disadvantage of this coder is that it needs a fair amount of main memory to store the model. However, the user can set an upper bound on the memory usage. And he can specify which model restoration method the coder shall apply when it runs out of memory:

• mr_restart (default):
The model is deleted completely and rebuilt from scratch. This method is fast.
• mr_cut_off:
Parts of the model are freed to gain memory. This method is optimal for so-called quasistationary sources. It usually gives better compression but it is slower.
• mr_freeze:
The model is not extended any more. This method is optimal for so-called stationary sources. (We want to point out that data streams arising in practical applications usually do not behave like a stationary source.)

#include < LEDA/coding/PPMII.h >

Types

 PPMIICoder::mr_method { mr_restart, mr_cut_off, mr_freeze } the different model restoration modes.

Creation

 PPMIICoder C(streambuf* src_stream = 0, streambuf* tgt_stream = 0, bool own_streams = false) creates an instance C which uses the given source and target streams. If own_streams is set, then C is responsible for the destruction of the streams, otherwise the pointers src_stream and tgt_stream must be valid during the life-time of C. PPMIICoder C(const char* src_file_name, const char* tgt_file_name) creates an instance C which uses file-streams for input and output.

Operations

Standard Operations

 void C.encode() encodes the source stream and writes the output to the target stream. void C.decode() decodes the source stream and writes the output to the target stream. uint32 C.encode_memory_chunk(const char* in_buf, uint32 in_len, char* out_buf, uint32 out_len) encodes the memory chunk starting at in_buf with size in_len into the buffer starting at out_buf with size out_len. The function returns actual length of the encoded chunk which may be smaller than out_len. If the output buffer is too small for the encoded data the failure flag will be set (see below). uint32 C.decode_memory_chunk(const char* in_buf, uint32 in_len, char* out_buf, uint32 out_len) decodes a memory chunk. The meaning of the parameters and the return value is the same as in the previous function. streambuf* C.get_src_stream() returns the current source stream. void C.set_src_stream(streambuf* src_stream, bool own_stream = false) sets the source stream (cf. constructor). void C.set_src_file(const char* file_name) sets a file as source stream. streambuf* C.get_tgt_stream() returns the current target stream. void C.set_tgt_stream(streambuf* tgt_stream, bool own_Stream = false) sets the target stream (cf. constructor). void C.set_tgt_file(const char* file_name) sets a file as target stream. void C.reset(bool keep_parameters = true) puts C in the same state as the default constructor. If keep_parameters is false the parameters are set to their default values. bool C.failed() returns true if an error occured. bool C.finished() returns true if the coding is finished. string C.get_description() provides a description for C.