How to stream data dumps

Apr 3, 2011 at 10:08 AM

I am trying to stream wikipedia's 7z datadumps using c#. I want to stream the compressed files without completely decompressing them as the data dumps themselves are massive. I noticed this library has a stream option but it does not seem to be working right or I am trying to use it wrong. Here is a sniplet of my code:

Stream stream;
LzmaDecodeStream wikiStream;
XmlTextReader xmlReader;

stream = File.OpenRead(lines[x]);
wikiStream = new LzmaDecodeStream(stream);
xmlReader = new XmlTextReader(wikiStream);

I get the following error:

An unhandled exception of type

'System.ArgumentOutOfRangeException' occured in mscorlib.dll

Additional information: MemoryStream length must be non-negative and less than 2^31 - 1 - origin.

May 7, 2011 at 2:42 PM
Edited May 11, 2011 at 2:13 PM

same problem. On-the-Fly decompression not working :(

sample 1:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    using (LzmaDecodeStream szs = new LzmaDecodeStream(fs))
    {
         using (XmlTextReader reader = new XmlTextReader(szs)) <--- Exception here
         {
               [...]


Sample 2:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    using (LzmaDecodeStream szs = new LzmaDecodeStream(fs))
    {
         int test = szs.Read(new byte[4096], 0, 100); <--- Exception here
         [...]



StackTrace:
 at  System.IO.MemoryStream.SetLength(Int64 value)

 

Seams to occurs in the "ReadChunk" methode in the "LzmaDecodeStream" class.
1. long size;   <---- init with 0
2. properties = SevenZipExtractor.GetLzmaProperties(_input, out size);   <--- seam to get false size
3. _buffer.SetLength(size); <---- as result fails

a 15kb size *.7z file with contend is a 500kb size xml file, generates a size of 8975296948835909660 in the GetLzmaProperties Methode.
There went somethink wrong.






May 20, 2011 at 3:46 PM

seams here is no activity on this library by the developer since aug. 2010.... :(

Coordinator
Jul 11, 2011 at 11:27 AM

No-no guys, you got it wrong.

LzmaDecodeStream can only be used with the data previously compressed with LzmaEncodeStream or something. It is 100% not intended to work in such a way you describe right from the start.

 

As for this case... Dig into SevenZipExtractor methods. It seems to me it is not possible with the existent code though.

Jul 11, 2011 at 11:44 AM

hi markhor,

thx for your answer.

If i get you right, it is not possible to read 7z archives on the fly.
So i have to decompress the hole file first, right?

It for reading big compressed xml files in my case. would be nice to not need to compress hole file first.

with GZip i do this this way:

 using (FileStream fs = new FileStream(fileWatchThread.FilePath, FileMode.Open))
                {
                    using (GZipStream zipStream = new GZipStream(fs, CompressionMode.Decompress))
                    {
                        using (XmlTextReader reader = new XmlTextReader(zipStream))
                        {
                            int i = 0;

                            while (reader.Read())
                            {
                                if (reader.Name == "row" && reader.NodeType != XmlNodeType.EndElement)
                                {
                                    [...]
} } } } }
Jul 11, 2011 at 11:48 AM

hole file being compressed xml that exceeds 7 terabytes in my case. I so not want to decompress that.

Nov 17, 2014 at 4:17 AM
https://gist.github.com/nbevans/db695ad5e1ad02126b2e

This is a rough cut of how I fixed this. Basically whoever wrote the original LzmaDecodeStream was a silly sod. I have verified this works and it does. The suggestion that LZMA isn't capable of doing streaming decodes is ludicrous and completely false.