Next: Move-To-Front Coder ( MTFCoder Up: Lossless Compression Previous: Run-Length Coder ( RLECoder Contents Index

Burrows-Wheeler Transform ( BWTCoder )

Definition

The class BWTCoder applies the Burrows-Wheeler transform [18] (in encoding mode) or its reversal (in decoding mode) to the input stream. To be more precise, in encoding mode the input stream is partitioned in blocks and the transformation is applied to each block. The size of the blocks, i.e. the number of characters per block, can be specified by the user.

This transformation does not compress the input but it can be used as a preprocessing step for other coders. Applying BWTCoder first often improves the overall compression rates. The following combinations yield very good compression results:

CoderPipe3<BWTCoder, MTFCoder, A0Coder>
CoderPipe4<BWTCoder, MTFCoder, RLE0Coder, A0Coder>

#include < LEDA/coding/BWT.h >

Creation

BWTCoder C(streambuf* src_stream = 0, streambuf* tgt_stream = 0, bool own_streams = false, uint32 block_size = DefaultBlockSize)

creates an instance C which uses the given source and target streams. If own_streams is set, then C is responsible for the destruction of the streams, otherwise the pointers src_stream and tgt_stream must be valid during the life-time of C. The block size can be controlled via block_size.

BWTCoder C(const char* src_file_name, const char* tgt_file_name, uint32 block_size = DefaultBlockSize)

creates an instance C which uses file-streams for input and output. The block size can be controlled via block_size.

Operations

Standard Operations

void C.encode() encodes the source stream and writes the output to the target stream.

void C.decode() decodes the source stream and writes the output to the target stream.

uint32 C.encode_memory_chunk(const char* in_buf, uint32 in_len, char* out_buf, uint32 out_len)

encodes the memory chunk starting at in_buf with size in_len into the buffer starting at out_buf with size out_len. The function returns actual length of the encoded chunk which may be smaller than out_len. If the output buffer is too small for the encoded data the failure flag will be set (see below).

uint32 C.decode_memory_chunk(const char* in_buf, uint32 in_len, char* out_buf, uint32 out_len)

decodes a memory chunk. The meaning of the parameters and the return value is the same as in the previous function.

streambuf* C.get_src_stream() returns the current source stream.

void C.set_src_stream(streambuf* src_stream, bool own_stream = false)

sets the source stream (cf. constructor).

void C.set_src_file(const char* file_name)

sets a file as source stream.

streambuf* C.get_tgt_stream() returns the current target stream.

void C.set_tgt_stream(streambuf* tgt_stream, bool own_Stream = false)

sets the target stream (cf. constructor).

void C.set_tgt_file(const char* file_name)

sets a file as target stream.

void C.reset(bool keep_block_size = true)

puts C in the same state as the default constructor. If keep_block_size is false the block size is set to the default value.

bool C.failed() returns true if an error occured.

bool C.finished() returns true if the coding is finished.

string C.get_description() provides a description for C.

Additional Operations

uint32 C.get_block_size() returns the current block size (for encoding).

void C.set_block_size(uint32 block_size)

sets the block size.

Implementation

Our implementation is based on code for suffix sorting by P. Ferragina and G. Manzini.

Next: Move-To-Front Coder ( MTFCoder Up: Lossless Compression Previous: Run-Length Coder ( RLECoder Contents Index

BWTCoder	C(streambuf* src_stream = 0, streambuf* tgt_stream = 0, bool own_streams = false, uint32 block_size = DefaultBlockSize)
		creates an instance C which uses the given source and target streams. If own_streams is set, then C is responsible for the destruction of the streams, otherwise the pointers src_stream and tgt_stream must be valid during the life-time of C. The block size can be controlled via block_size.
BWTCoder	C(const char* src_file_name, const char* tgt_file_name, uint32 block_size = DefaultBlockSize)
		creates an instance C which uses file-streams for input and output. The block size can be controlled via block_size.

void	C.encode()	encodes the source stream and writes the output to the target stream.
void	C.decode()	decodes the source stream and writes the output to the target stream.
uint32	C.encode_memory_chunk(const char* in_buf, uint32 in_len, char* out_buf, uint32 out_len)
		encodes the memory chunk starting at in_buf with size in_len into the buffer starting at out_buf with size out_len. The function returns actual length of the encoded chunk which may be smaller than out_len. If the output buffer is too small for the encoded data the failure flag will be set (see below).
uint32	C.decode_memory_chunk(const char* in_buf, uint32 in_len, char* out_buf, uint32 out_len)
		decodes a memory chunk. The meaning of the parameters and the return value is the same as in the previous function.
streambuf*	C.get_src_stream()	returns the current source stream.
void	C.set_src_stream(streambuf* src_stream, bool own_stream = false)
		sets the source stream (cf. constructor).
void	C.set_src_file(const char* file_name)
		sets a file as source stream.
streambuf*	C.get_tgt_stream()	returns the current target stream.
void	C.set_tgt_stream(streambuf* tgt_stream, bool own_Stream = false)
		sets the target stream (cf. constructor).
void	C.set_tgt_file(const char* file_name)
		sets a file as target stream.
void	C.reset(bool keep_block_size = true)
		puts C in the same state as the default constructor. If keep_block_size is false the block size is set to the default value.
bool	C.failed()	returns true if an error occured.
bool	C.finished()	returns true if the coding is finished.
string	C.get_description()	provides a description for C.

uint32	C.get_block_size()	returns the current block size (for encoding).
void	C.set_block_size(uint32 block_size)
		sets the block size.