Gradual Performance Degradation for .Check() in the Extractor

Sep 26, 2012 at 12:23 AM

I'm writing an application that will batch over zips and run a dictionary of passwords against each respective zip in an attempt to detect the password.  I'm using the .Check() function provided in the Extractor.  However, I've noticed very serious performance degradation for each additional zip I try.  My code is pretty simple, and everything gets properly disposed.  Here's a sample of my usage:

foreach (String password in passwords)
                {
                    using (SZDO.SevenZipExtractor store = String.IsNullOrEmpty(password) ? new SZDO.SevenZipExtractor(filePath) : new SZDO.SevenZipExtractor(filePath, password))
                    {
                        if (store.Check())
                        {
                            return password;
                        }
                    }
                }
Anybody have any suggestions?  I'd prefer to not have to run this in a separate app domain...

 

Oct 9, 2013 at 4:30 AM
I can confirm your observation, but I don't find it related to .Check()

I too am seeing a significant fall-off in performance over time

I have a large number of SQLlite database files to process, Inside each file there is a table containing a field compressed by zlib.

the first file, of about 50,000 records takes about 30 seconds to extract these data fields using code similar to yours. Each file after, of about the same size, takes just over 10 percent longer to extract. Rather quickly the process becomes too slow to be useful in the context of thousands of files as the processing time has doubled aprox. every 8 files

The compressed field in question is an xml string made up of many floating point values with meta data attached, and as such are close to noise and perhaps not the best candidates for compression. The passwords you are checking, which perhaps are not a random as my data might well be more random that typical text.

I am not at all familiar with compression methods, but I know a dictionary of some sort is involved. Perhaps the extractor is maintaining a dictionary internally that if presented with noisy data it grows in size, lengthening the extraction time? . I don't know. I do see the physical memory usage gradually increasing as I process data, and like yourself have been careful to dispose

I looked at setting up secondary app domains as you mentioned, and I think it would work, but found I had little taste for adding that level of complexity to my simple occasional-use parsing engine. Instead I wrote a command line app which my program calls once per file. this worked out well and my rate of extraction is now constant across the files. If I am close to the mark on the dictionary issue this might be the reason the extraction fails if placed inside a .NET parallel.for loop, as the extractor assembly is somehow holding a common memory area open for sequential (but not parallel use)

I do like this library, I had considerable difficulty finding anything that would expand the compression format I am presented with, and it was the only thing that worked out of the box in a 64 bit environment.
Oct 14, 2013 at 7:37 PM
Had similar issue. This is result of combination of bad design + bug in SevenZipBase

in SevenZipBase author generated uniuque id to use later to compare 2 instances of SevenZip. Which is not very smart since as ids are unique
result would be exactly the same as usual compare.

here is a bad code. Existing ids added to static list!
    private static int GetUniqueID()
    {
        int id;
        var rnd = new Random(DateTime.Now.Millisecond);
        do
        {
            id = rnd.Next(Int32.MaxValue);
        }
        while (Identificators.Contains(id));
        Identificators.Add(id);
        return id;
    }
Now, author had a idea about removing from the list once instance get disposed, and he did:
    /// <summary>
    /// Removes the UniqueID from the list.
    /// </summary>
    ~SevenZipBase()
    {
        Identificators.Remove(_uniqueID);
    }
unfortunately, this code will never be called because of the suppressfinalize call
    /// <summary>
    /// Releases the unmanaged resources used by SevenZipExtractor.
    /// </summary>
    public void Dispose()
    {
        if (!_disposed)
        {                
            CommonDispose();
        }
        _disposed = true;            
        GC.SuppressFinalize(this);
    }
so naturaly you not only have a memory leak, but because it's a list search takes longer and longer as it grows.