CompressStream issue with TAR format archives

Feb 28, 2013 at 7:17 PM
Not sure there is any help for this, but I think I know why TAR archives don't seem to extract properly for some people. And why the error message "There are data after end of archive" crops up when the extraction is attempted.

I built a very simple WinForms app that creates 3 input memory streams and I Compress them using CompressStream (create on the first, append on the second 2). I give each a distinct name and all of the data is outputted to my output stream just fine. Then, so I can test the veracity of the archive, I outputted my output stream to a file stream and got it all on disk. Then using 7-zip itself I attempt to extract the 3 files from the archive. I end up with the first file on disk (which is perfect) and the error message "There are data after the end of archive". So here's where it gets interesting...

In tracing the SevenZipSharp code I found that despite the fact my input streams were less than 50 bytes, the resulting output stream is over 4 blocks long (~2048 bytes). So despite the fact I need less than a block to contain the compressed input stream, the output is over 2000 bytes of mostly empty space.

After reading up a bit on the TAR specification I learned that the TAR spec calls for 2 blocks of empty data (1024 bytes) to be written to the end of the file (stream) to serve as an end of archive marker. Anybody else see the problem yet...

BECAUSE CompressStream is adding THREE blocks of unneeded data for my 50 byte stream, an apparent end of archive structure is being put in place prematurely by the call to CompressStream. So when 7-zip starts extracting it does the first entry fine, but stumbles across 1024 empty bytes before it ever gets to my second entry. Hence the error "There are data after end of archive". 7-zip thinks the end of archive is near the top of my archive and not at the bottom where it should find it.

So, as far as I can tell, CompressStream is causing the problem. The TAR spec says that everything in a TAR archive should exist within a 512 byte block. Ok, fine. CompressStream should create exactly as many 512 byte blocks as needed to contain the data to be compressed and NO EXTRA blocks added. In my case I am getting THREE superfluous blocks for each tiny stream (50 byte) I pass in.

We either need to fix the existing logic for CompressStream or add an overload that only creates as many blocks as really needed.

I'm not sure of the process form here. How do things like this get fixed...or is it on me to fix it?

Thanks for listening!