A way to improve extraction speed?

Jul 25, 2010 at 6:56 PM

To start out I have to say that you've done a great job on this project.  It has saved me a lot of time and I thank you Markhor, for your hard work.

I've hit a bit of a performance wall with extraction speed, particularly with zips and rars files formats.  As a test I would extract a zip file or rar file straight into memory, with the file consisting of about 750 JPEG images and running about 150 MBs.

CPU usage maxes out only one of my four cores (Intel Q9550) while hard drive activity is minimal.  VS2010's performance profiler puts the extraction call (ExtractFile()) as a bottleneck. 

In all, the extraction process takes about 5000-5400 ms which is still fast but not quite what I want to achieve for in-memory extraction.  As a comparison, Winrar will extract these archives to disk in about 7-8 seconds

Here is a sample of my code:



            foreach (ArchiveFileInfo afi in compressedFileInfo)
                if (afi.IsDirectory == false)
                    stream = new MemoryStream();
                    compressedFile.ExtractFile(afi.FileName, stream);
                    stream.Seek(0, SeekOrigin.Begin);

                    StreamList.Add(new FileEntry(afi.FileName, stream));
                Application.Current.Dispatcher.Invoke(DispatcherPriority.Background, new Action(() => progressbar.Value++));

Is there any way improve the performance of this extraction?  Initially I tried creating multiple SevenZipExtractors and attempting to extract from the same file in parallel using the new Parallel.ForEach in .Net 4. 

This turned out to be slower and I'm guessing it has to do with IO locks on accessing one file and then the overhead of dealing with multiple threads that are being blocked. 

Do you see anything wrong with my code that might be a problem or have any suggestions to improve speed? 

Thanks again for making this awesome library.


Aug 22, 2010 at 8:46 AM

You use the library wrong. ExtractFile is suitable for single files only. Use ExtractFiles or ExtractArchive.

Nov 30, 2010 at 8:10 PM

As far as I can tell, the ExtractFiles and ExtractArchives methods extract to a directory.  In my case, I need all my extractions to stay in memory via stream.