Dump a pickle file to a readable text file

I got a data file from a client recently in “pickle” format. I happen to know that pickle is a binary format for serializing Python objects, but trying to open a pickle file could be a puzzle if you didn’t know this.

Be careful

There are a couple problems with using pickle files for data transfer. First of all, it’s a security risk because an attacker could create a malformed pickle file that would cause your system to run arbitrary code. In the Python Cookbook, the authors David Beazley and Brian K. Jones warn

It’s essential that pickle only be used internally with interpreters that have some ability to authenticate one another.

The second problem is that the format could change. Again quoting the Cookbook,

Because of its Python-specific nature and attachment to source code, you probably shouldn’t use pickle as a format for long-term storage. For example, if the source code changes, all of your stored data might break and become unreadable.

Suppose someone gives you a pickle file and you’re willing to take your chances and open it. It’s from a trusted source, and it was created recently enough that the format probably hasn’t changed. How do you open it?

Unpickling

The following code will open the file data.pickle and read it into an object obj.

    import pickle
    obj = pickle.load(open("data.pickle", "rb"))

If the object in the pickle file is very¬† small, you could simply print obj. But if the object is at all large, you probably want to save it to a file rather than dumping it at the command line, and you also want to “pretty” print it than simply printing it.

Pretty printing

The following code will dump a nicely-formatted version of our pickled object to a text file out.txt.

    import pickle
    import pprint

    obj = pickle.load(open("sample_data.pickle", "rb"))

    with open("out.txt", "a") as f:
         pprint.pprint(obj, stream=f)

In my case, the client’s file contained a dictionary of lists of dictionaries. It printed as one incomprehensible line, but it pretty printed as 40,000 readable lines.

Prettier printing

Simon Brunning left a comment suggesting that the json module output is even easier to read.

    import json

    with open("out.txt", "a") as f:
        json.dump(obj, f, indent=2)

And he’s right, at least in my case. The indentation json.dump produces is more what I’d expect, more like what you’d see if you were writing the structure in well-formatted source code.

Related posts

2 thoughts on “Dump a pickle file to a readable text file

Leave a Reply

Your email address will not be published.