Author's notes, December 2, 2001

This article is in the original form that I delivered to Visual Developer magazine, and does not reflect any editing changes that they made (if any).

The code for this article is available for download on my Tools and Utilities page (cabsrc.zip, 27 K bytes).

There is also some undocumented and mostly untested cab compression code (cabcomp.zip, 29 K bytes) that I had originally intended to include in a follow-up article.  The compressor code doesn't have the nice Delphi component interface that the decompressor code has, but it does appear to work.  If you build a component around it, please consider posting it, and let me know so that I can link to it (jim@mischel.com).

Grab a CAB:  CAB Compression

(This article originally appeared in Visual Developer Magazine, Sep/Oct 1999)

When you're writing Internet applications, bandwidth is the limiting factor.  If you want your application to be widely accepted, you have to write it to the lowest common denominator--usually a 28.8 KBPS modem.  Of course, even 28.8 KBPS is a pipe dream for many of us.  When I connect to my local ISP, I most often see a reported connection speed of 24,000 BPS.  Even that speed is the connection's maximum capability, which is very rarely the connection's actual throughput.

What all this means for your Internet applications is that, in order to perform acceptably, they must transmit their information in as few bits as is reasonably possible.  Hence the point of this article: compression.

Web browsers and browser plug-ins already employ several special-purpose compression methods.  For example, Web pages use JPEG format to compress images and you can download sound (entire album tracks, in fact) in compressed MPEG3 format.  If you have the proper plug-ins, these and other special-purpose compression formats are very useful.

If your application's data is in one of these common file formats, then you can probably use one of the special-purpose compression methods to lower your application's bandwidth requirements.  If your application's data is not as easily classified, then you have a choice: transmit the data uncompressed, write your own special-purpose compressor, or use a general-purpose compression library.

Of the three options, using a general-purpose compression library gives the most bang for the buck.  There are several such libraries available free or at a low cost.  Many, perhaps most, of these libraries use the popular ZIP compression format, and give reasonably good compression ratios on arbitrary data.

What many programmers don't realize, though, is that any computer that has Internet Explorer 3.0 or later, Windows 98, or an upgraded Windows 95 installed already has a general-purpose compression and decompression library installed.  CABINET.DLL, which is located in the System directory, contains all of the functions required to create and read Microsoft Cabinet files--a general-purpose compression format that Microsoft uses to distribute software.  Internet Explorer can extract and install ActiveX controls from CAB files that it downloads as instructed by HTML OBJECT tags.  The CAB file format is the closest thing we have to a universal compression format.

It's there, it works, and it's free.  Let's use it.

Getting started with CAB files

To get started with CAB files, you need to obtain the Microsoft Cabinet SDK.  You can find the CAB SDK installation program (cab-sdk.exe) in the \bin directory of the Internet Client SDK, or you can download it from Microsoft's web site at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncabsdk/html/cabdl.asp (this link was valid as of December 3, 2001).  If you search for the phrase "CAB SDK" on Microsoft's Web site, you'll get links to several useful articles.

The installed CAB SDK contains a complete and well-documented compression library.  It includes two programs that create CAB files (with good documentation), and a CAB file extraction program called EXTRACT.EXE, which is a 32-bit version of the EXTRACT.EXE program that is probably already in your Windows directory.  You should replace the 16-bit EXTRACT.EXE that's in your Windows directory with the version that you'll find in the CABSDK\BIN directory.

For developers who are interested in adding CAB file creation and extraction to their own programs, the CAB SDK includes C header files, C-compatible libraries, and CABINET.DLL (which is probably already in your System directory).  The DOC subdirectory contains very good documentation on all of the API functions and on the LZX and MSZIP file formats with which the CAB API functions work.  Finally, the SAMPLES directory contains some C samples that show how to use the SDK functions.

Of course, the entire CAB SDK is C-centric, so I had to convert the FDI.H header file into a Delphi unit before I could create the Delphi CAB extraction component.  I'm not going to go into the details of that conversion here, nor will I print the code for the extraction component.  Both of these files are available in the listing archive for this issue of the magazine.

If you use the standard API to decompress a CAB file, you have to create file and memory access functions, and callback functions that respond to decompression events.  You then create a decompression object and then call the FDICopy function to extract the files.  FDICopy calls your callback functions and makes use of the file and memory access functions that you've defined.  What with all of the callback functions, case statements, and C-centric memory and file manipulation, using the CAB API directly is work.

The TCabDecompressor component wraps all of the SDK calls and the standard file and memory functions and provides an event-oriented interface to CAB decompression.  The result is a component that you can easily and quickly include in your programs by overriding a few methods or creating a couple of event handlers.

Reading CABs from a Delphi program

In order to build and test the examples in the rest of this article, you'll need to download the listings archive and install the CAB component (CABPKG.BPL) in your copy of Delphi.  I wrote and tested the component using Delphi 4, but it should work without change in versions 2 and 3, if you recompile it.

The simplest CAB file access program just lists the names of the files within the cabinet.  The first example, CABLIST.DPR illustrates this by filling a listbox with the file names as shown in Figure 1.  The program is painfully simple to create.  Just drop a CabDecompressor component onto the form along with the visual components, add a CopyFile event handler for the decompressor, and an OnClick handler for the button.  Listing 1 shows the event handler code.

Listing 1 - Event handlers for the simple CAB list program

function TCabForm.CabDecompressor1CopyFile(AFilename: PChar;
 
Size: Integer; date, time, attribs, iFolder: Word;
  pUserData: Pointer): Integer;
begin
  ListBox1.Items.Add (AFilename);
  Result := 0;
end;

procedure TCabForm.btnUpdateClick(Sender: TObject);
begin
  ListBox1.Clear;
  CabDecompressor1.Extract (edFilename.Text, nil);
end;

Figure 1 - The CAB list form

 If you'd rather have a command line version of the program, that's simple too.  The CABLST program shown in Listing 2 illustrates how to access the CAB decompressor component from a console application.

Listing 2 - A console application to list the contents of a CAB file

{ CABLST.DPR - List the names and sizes of all files in a CAB }
program cablst;

uses windows, cabd;

{$R *.RES}

type
  TCabLister = class (TCabDecompressor)
  public
    function DoCopyFile (AFilename:PChar; Size:Longint;
      date, time, attribs, iFolder:WORD; pUserData: Pointer) : integer; override;
  end;

function TCabLister.DoCopyFile (AFilename:PChar; Size:Longint;
  date, time, attribs, iFolder:WORD;
  pUserData: Pointer) : integer;
begin
  WriteLn ('File: ', AFilename, '  --  Size = ', Size);
  Result := 0;
end;

var
  CabFileName : String;
  decomp : TCabLister;

begin
  // get file name
  CabFileName := ParamStr(1);
  decomp := TCabLister.Create (nil);
  if (decomp.IsCabFile (CabFileName)) then
  begin
    if (not decomp.Extract (CabFileName, nil)) then
      WriteLn ('List failed.');
  end
  else
    WriteLn ('The file "', CabFileName, '" isn''t a CAB file.');

  decomp.Free;
end.

 You'll notice that the command line version uses a slightly different method to handle the CopyFile notification from the decompressor component.  Whereas the form creates an OnCopyFile event handler, the command line version overrides the component's DoCopyFile method.  What's the difference?  Nothing much, really, other than simplicity.  It's easier for a GUI application to respond to an event and it's easier for a command line program to override a virtual method.  The implementation of the decompressor component makes both very easy to accomplish.

Under the Hood

If you want to do anything more involved than list or extract all of the files in a cabinet, then you have to understand a little of how CAB files are constructed.  You also have to know which decompression events will occur when you call Extract, and in which order they will occur.

A cabinet file contains one or more folders.  A folder contains one or more files, or pieces of files.  A folder can span cabinet boundaries.  The MAKECAB program that comes with the CAB SDK can create multi-cabinet distributions that split a single large file into multiple, smaller, pieces.

A folder is a decompression unit, which means that in order to decompress a file, the decompression functions must read all of the data from the start of the folder up to and including the desired file.  This presents a performance problem if you have many files in a single folder and are going to be decompressing only a few of them.  The MAKECAB documentation contains a good discussion of how to organize your folders for the best performance in a particular situation.

When a program calls the CabDecompressor component's Extract method, the component in turn calls the CAB SDK's FDICopy function, passing it the addresses of the Notification and Decryption callback functions that are defined in the component's FCallbacks record.  FDICopy calls these functions throughout the decompression process.  FDICopy also calls the memory management and file access functions that are defined in the FCallbacks record.  The FCallbacks record is initialized by the decompression component's GetCallbacks method.

When you call FDICopy, to list all of the files in a cabinet, the sequence of events received is:

·        OnCabinetInfo gets information about the cabinet

·        OnEnumerate   starts the enumeration

·        OnCopyFile    for each file in the cabinet

·        OnEnumerate   ends the enumeration

In this case, your OnCopyFile event handler would simply output the file name passed in the AFilename parameter and return 0 to halt further processing.

If you want to extract one or more files from the cabinet, then when the OnCopyFile event handler is called for the desired file, you must open a file for output and return the file handle as the function result.  FDICopy then calls the file write callback function multiple times to write to the file, and when the file is finished your OnCloseFile event handler is called.  After your close file processing is finished, FDICopy regains control and calls OnCopyFile for the next file in the cabinet.

The OnNextCabinet and OnPartialFile handlers are called when decompressing a cabinet that spans multiple files.  The decryption event handlers are called when decompressing a cabinet that was created with encryption.  For more information about these events, consult the CAB SDK documentation.

Extracting a File

As a simple example of extracting a file from a cabinet, I've slightly modified the interactive CAB list program.  I added an Extract button that, when pressed, extracts the file currently selected in the listbox from the cabinet file and writes it to disk.  The event handlers for this program are shown in Listing 3.

Listing 3 - Event handlers for the CAB extractor application

procedure TForm1.btnUpdateClick(Sender: TObject);
begin
  lbFiles.Clear;
  CabDecompressor1.Extract (edFilename.Text, nil);
end;

function TForm1.CabDecompressor1CopyFile(AFilename: PChar;
  Size: Integer; date, time, attribs, iFolder: Word;
  pUserData: Pointer): Integer;
var
  TempHandle : Integer;
  ExtractFilename : PString absolute pUserData;
begin
  // If extracting a file, then compare this filename with the
  // file that we want to extract.  If they match, then open
  // the file and return the handle.  Otherwise just return 0.
  if FExtracting then
  begin
    if CompareText (AFilename, ExtractFilename^) = 0 then
    begin
      // open a file for writing and return the handle.
      TempHandle := CabDecompressor1.Callbacks.OpenFunc
        (AFilename,
         (O_BINARY or O_CREAT or O_TRUNC or O_WRONLY or
          O_SEQUENTIAL),
         (S_IREAD or S_IWRITE));
      Result := TempHandle;
    end
    else
      Result := 0;
  end
  else begin
    lbFiles.Items.Add (AFilename);
    Result := 0;
  end;
end;

function TForm1.CabDecompressor1CloseFile(AFilename: PChar;
  hf: Integer; date, time, attribs, iFolder: Word;
  RunFile: Integer; pUserData: Pointer): Integer;
begin
  // Done writing file.
  // Close it and set the date and attributes.
  cabd.CloseAndDateStampFile
    (AFilename, hf, date, time, attribs);
  Result := 1;
end;

procedure TForm1.btnExtractClick(Sender: TObject);
var
  Filename : String;
begin
  if lbFiles.ItemIndex <> -1 then
  begin
    // Get filename and set extracting flag.
    Filename := lbFiles.Items[lbFiles.ItemIndex];
    FExtracting := true;

   
// do the extract
    CabDecompressor1.Extract (edFilename.Text, @Filename);
    FExtracting := false;
  end;
end;

The second parameter passed to the decompressor's Extract function is a pointer to user data.  The decompressor doesn't actually do anything with this pointer or the data to which it points.  Instead, it passes the pointer to all of the notification callback functions.  Your application can use this pointer to pass context-specific data to callback functions.  In this example, the program passes a pointer to the string that contains the name of the file you want to extract.

Overriding Default Behavior

The standard event interface (or virtual function overrides if you're writing a console mode application) should be sufficient for most applications that need to extract files from a cabinet.  However, if your application needs to control the cabinet extraction process more closely, you can override the callback functions that the FDICopy function calls.  This will provide you very close control without forcing you to write directly to the CAB API.  Although if you really want to write to the API, you can call the functions defined in FDI.PAS.

To override the callbacks, your application must create an OnGetCallbacks event handler.  The component calls this handler once the first time your application calls Extract.  The aCallbacks record passed to OnGetCallbacks contains the addresses of the component's default callback functions.  Your handler should replace the callback functions in this record with your custom callback functions.  You don't have to replace all of the callbacks--just the ones that you want to override.  Remember that all of the callback functions are called by a DLL function that was written in C, so they must use the cdecl calling convention.  See the definitions of the standard callbacks (FDIAllocMem, FDIFreeMem, etc.) in CABD.PAS for examples of proper callback function declarations.

If you override the default notification or decryption callbacks, be sure that you understand how to handle the user data pointer, and that you propagate it properly.  See the implementation of FDINotify and FDIDecrypt in CABD.PAS for examples.  If you override these callbacks, you've taken full control of the cabinet extraction process, so you shouldn't have to delve into the API in FDI.PAS unless you're doing something strange.

Summary

Although this article shows only simple examples, you should be able to use TCabDecompressor to decompress any cabinet file.  The simple event interface makes small decompression tasks very easy to implement, and because the component exposes all of the callback functions, managing very complex decompression tasks should be very straightforward.