LZW Compression and decompression

by Yi-Jou Chen

Introduction
Goals
Approach
Design
Results
Getting the Modules
Using the Modules & Examples
Conclusions & Future Work
References

Compression Module

	Initialize table to contain single-character strings
	Read the first input character -> prefix string w
	Step:   Read next input character K
		if no such K(input exhausted): 
			code(w) -> output: EXIT
		if wK exists in string table: 
			wK->w; repeat Step.
		else wK not in string table: 
			code(w) -> output:
			wK -> string table;
			K->w; repeat Step.
			
Decompression Module
	
	Decompreesion:	First input code -> CODE -> OLDcode;
			with 	CODE = code(K), 
				K->ouput;
			     	K->FINchar;
    	Next Code:	Next input code -> CODE	->INcode;		
			if no new code: EXIT
			if CODE not defined(special case):
				FINchar -> output;
				OLDcode -> CODE;
				code(OLDcode, FINchar) -> INcode;
	Next Symbol:	if CODE = code(wk)
				K -> stack,
				code(w) -> CODE;
				Go to Next Symbol;
			if CODE = code(K)
				K -> output;
				K -> FINchar;
			Do while stack not empty
				stack top -> outpu;
				POP stack;
			OLDcode, K -> string table;
			INcode -> OLDcode;
			Go to Next Code;

           Figure 1. Shared Components among Different Fields

DX Data Model

As mentioned before, the DX data model is different to the file system but has the same hierarchy. In addition to DX special data structure, we must use the library supported by Data Explorer when create our own modules. The Data Explorer library is object oriented, that is, Data Explorer objects are data structures in global memory that are passed by reference, and whose contents are private to the implementation. Furthermore, different objects could share the common components.(figure 1)

Some terminology of DX's data model must be introduced before describing the detail implementation.

Object: An object is a data structure stored in memory that contains
an indication of the object's type, along with additional 
type-dependent information.
Field: A field represents a mapping from some domain to some data space.
The information in a field is represented by some number of named 
components.
Array: Array objects hold the actual data, positions, connections, and so on.
Attribute: store the relation among different objects.

Regular vs Irregular Array

Irregular Array:is the most general way to specify the contents of an array.
Regular Array:is a set of n-dimensional points lying on a line in n-space
	       with a constant n-dimensional delta between them, represents.

		Figure 2. Irregular vs Regular Array

As above description, we could know the regular array do not really occupy memory as its declaration. Therefore, before apply the compression procedure, we must check the array type of every member in the input object and ignore the regular array. In addition to the type of array, we must also get other information about the array, for example, data type, item number, rank, etc. There is a main difference between array here and the file in file system - no EOF pointer. Hence, we must get its size before compress it.

Furthermore, after compressed the array, we must store the description of original array because it must be restored after the execution of decompression module. In addition to the size of array, the attributes among the objects must keep unchanged. If one of those attributes is lost, the Data Explorer could not find the references and relations between two objects and would cause fatal error.

Constraints of DX library

In DX's data model, the deletion of components must notice two important conditions. First, it maybe just decrease the reference count. Second, there are might be other components dependent on this deleted objects. Those reference and dependency information is only accessed by the functions supported by DX's library. Consequently, some critical restrictions in following function cause the compression module can not touch the low level data access which could get better performance.

DXAddArrayData: the input object can replaced with the compressed object by this function and all attributes associated with the prior value will be copied to the new value and will supersede attributes already attached to the new value. Usually, after apply this function, the DXEndField must be called because it will checks to make sure that the number of elements in a component being set to depend on the related component does actually match the number of positions in the field. However, in this project, any component can not syntactically depend on any other component because the size of data is syntatically meaningless after compression. Therefore, the DXEndField function can not be called as usual.

Performance Testing Modules

There are several routines that can be used to measure the performance of the system. Calls to DXMarkTime() are made at key points in the system. Time marks are batched until a call to DXPrintTimes(). The printing of timing messages can be enabled by calling DXTraceTime().

In order to get large data array for performance testing, the Irregular module is created.This module can force the input object to expand its component from regular to irregular array.

RESULTS

Tow examples are tested in this project:

EXAMPLE 1: The main objects used the "Texture Mapping" and "Parametric Equation" 
to generate two identical 3-D earths and distributed these two objects to two
different workstations of HP machine, "grieg" and "wagner". In those remote
workstations applied the simple "translation" operation.  After those
computations are done, received and collected the two objects to become a
compound object in master(local) machine. In this case, only the color
components are compressed because it is the only one with irregular array.

EXAMPLE 2: The same flows and operations as EXAMPLE 1 but using "Quadric Surface" 
module to generate the main objects. The four components are compressed after
applying the "Irregular" module to expand its regular components.

			Table 1	

The static information are shown in the Table 1. We can see the LZW algorithms
get a very good compression ratio for the image data in EXAMPLE 1. On the other
hand, the performance of DX's normal data structure is not good enough as
expected.

			Table 2

The network performance is illustrated in the Table 2. The data is collected
from five executions in each example. The "NON" columns means the comparison
case which did not apply the LZW compression module, but under the same flows
and structures. The "MIN" and "MAX" represent minimum and maximum time in the 
five executions. As expected, the EXAMPLE 1 has good performance in each
categories. On the other hand, the EXAMPLE2 can not save time because the 
reduced time in network can not cover the time of compressing data.

GETTING THE MODULES

C source code

Examples

USING THE MODULES & EXAMPLES

Creating Execution Groups

By default, all tools in a visual program are in a single unnamed execution group. This execution group will be executed on the master workstation. New execution groups are created and existing execution groups are modified and deleted using the Execution Group dialog box, click here to see an example.

After invoke the dialog box, a group name must be assigned and its click the "Add To" button to add a set of selected tools to be added to an existing execution group. You could also use "Show" button to display all the tools on the canvas that are members of the selected group to be selected.

In above example, I create four execution groups to test, two groups apply the compression module, the others are the comparison groups. In the two compression groups, they simply did the "scale" computation.

Assigning Execution Groups to Workstations
Once you have decomposed your visual program into execution groups, you can assign these groups to workstations (or hosts). If you do not specify a host for a particular execution group, the group will be executed on the master. Execution groups are assigned to a host using the "Execution Group Assignment" dialog box, illustrated in the right box of this example.

Using LZWcompressiong module

           Figure 3. Using LZW Modules

Like above picture compression and decompression module must be used in pair. In general, the compression module is added to the diagram before distributed objects to different workstations and decompression is executed after the remote workstation receive the objects.

CONCLUSIONS & FUTURE WORK

In general, the LZW algorithm is a fast compression method which can get good compression ratio in different kind of data. From this project, we can get the expected result in this point. However, we can get the satisfied improvement in the network performance. According to my study, there are several reasons can explain above results.

First, there are no low level routine supported by DX library to access and manage its data structure. The compression technique always need concern the system level routines to manage the data. The Data Explorer, as mentioned before, is object-oriented data model and with the information-hiding property. Besides, it has its own data control units to manage which size of data model can be applied.

Second, when we do the compression on networks, we must know the exact number of packets are sent or received for performance testing, or even apply the compression algorithm. In Data Explorer, it even did not support any routine for network programming.

Finally, It is very difficult to get precise performance data in this project because of DX's data flow model. In such kind of model, we can not exactly know the dependency and reference among objects. That is, we can not get the real time information about the total size of current objects.

Generally speaking, there might be a method to solve above dilemma, In the network programming, we can execute another process to intercept and compressed the packets before DX send to remote workstations. On the receiver cites, intercept the incoming data and decompress it before send to DX's execution module. The above scheme is embedded in Data Explore and the user can not specify the compression module interactively.

As suggested by advisor, the LZW compression module can be applied in many ways in Data Explore. One is to save the data generated by "Export" module. This will be the future work which can interactively interpret the DX's data format and decompress the data from the stored file. It also can save tremendous file size in file system when many users use the Data Explorer in the same file system because we always have "file system full" problems in CS 418 course.

References

[1] Terry A. Welch "A Technique for High-Performance Data Compression" IEEE COMPUTER, June 1984. pp. 8-19

[2] IBM Data Explorer 2.0. For more information, click here

LZW Compression and decompression

Table of Contents

INTRODUCTION

GOALS

APPROACH

LZW COMPRESSION MODULE DESIGN

LZW Algorithm