Definition
The PPMIICoder is based on the compression scheme
``Prediction by Partial Matching with Information Inheritance''
by D. Shkarin [81].
This coder works as follows: Suppose we have processed the first n - 1 symbols
x1...xn-1 of the stream. Before reading the next symbol xn we try
to guess it, i.e. for every symbol s we estimate the probability p(s) for
the event ``xn = s''. This probability distribution determines how the next
symbol is encoded: The higher p(s), the fewer bits are used for encoding s.
If our estimation is good, which means that p(xn) is high, then we obtain
a good compression rate.
In order to predict the probality distribution for the nth symbol the PPM
approach considers the preceding k symbols
xn-k...xn-1.
We call these symbols the context of xn and k the order of the
model. (For k = 0 we obtain the order-0 model from the previous section.)
E.g., if the current context is ``req'', then we should
predict the letter ``u'' as next symbol with high probability.
PPMII is a variant of PPM which usually achieves very accurate estimations.
The PPMIICoder combines very good compression rates with acceptable speed. (Shkarin [81] reports that his coder outperforms ZIP and BZIP2 with respect to compression rates and speed.) The only disadvantage of this coder is that it needs a fair amount of main memory to store the model. However, the user can set an upper bound on the memory usage. And he can specify which model restoration method the coder shall apply when it runs out of memory:
#include < LEDA/coding/PPMII.h >
Types
PPMIICoder::mr_method { mr_restart, mr_cut_off, mr_freeze } | |
the different model restoration modes. |
Creation
PPMIICoder | C(streambuf* src_stream = 0, streambuf* tgt_stream = 0, bool own_streams = false) | |
creates an instance C which uses the given source and target streams. If own_streams is set, then C is responsible for the destruction of the streams, otherwise the pointers src_stream and tgt_stream must be valid during the life-time of C. | ||
PPMIICoder | C(const char* src_file_name, const char* tgt_file_name) | |
creates an instance C which uses file-streams for input and output. |
Operations
Standard Operations
void | C.encode() | encodes the source stream and writes the output to the target stream. |
void | C.decode() | decodes the source stream and writes the output to the target stream. |
uint32 | C.encode_memory_chunk(const char* in_buf, uint32 in_len, char* out_buf, uint32 out_len) | |
encodes the memory chunk starting at in_buf with size in_len into the buffer starting at out_buf with size out_len. The function returns actual length of the encoded chunk which may be smaller than out_len. If the output buffer is too small for the encoded data the failure flag will be set (see below). | ||
uint32 | C.decode_memory_chunk(const char* in_buf, uint32 in_len, char* out_buf, uint32 out_len) | |
decodes a memory chunk. The meaning of the parameters and the return value is the same as in the previous function. | ||
streambuf* | C.get_src_stream() | returns the current source stream. |
void | C.set_src_stream(streambuf* src_stream, bool own_stream = false) | |
sets the source stream (cf. constructor). | ||
void | C.set_src_file(const char* file_name) | |
sets a file as source stream. | ||
streambuf* | C.get_tgt_stream() | returns the current target stream. |
void | C.set_tgt_stream(streambuf* tgt_stream, bool own_Stream = false) | |
sets the target stream (cf. constructor). | ||
void | C.set_tgt_file(const char* file_name) | |
sets a file as target stream. | ||
void | C.reset(bool keep_parameters = true) | |
puts C in the same state as the default constructor. If keep_parameters is false the parameters are set to their default values. | ||
bool | C.failed() | returns true if an error occured. |
bool | C.finished() | returns true if the coding is finished. |
string | C.get_description() | provides a description for C. |
Additional Operations
Implementation
Our implementation encapsulates the code by D. Shkarin (the implementation of the PPMII model) and by D. Subbotin (the implementation of the range coder).