I got a data file from a client recently in “pickle” format. I happen to know that pickle is a binary format for serializing Python objects, but trying to open a pickle file could be a puzzle if you didn’t know this.
Be careful
There are a couple problems with using pickle files for data transfer. First of all, it’s a security risk because an attacker could create a malformed pickle file that would cause your system to run arbitrary code. In the Python Cookbook, the authors David Beazley and Brian K. Jones warn
It’s essential that pickle only be used internally with interpreters that have some ability to authenticate one another.
The second problem is that the format could change. Again quoting the Cookbook,
Because of its Python-specific nature and attachment to source code, you probably shouldn’t use pickle as a format for long-term storage. For example, if the source code changes, all of your stored data might break and become unreadable.
Suppose someone gives you a pickle file and you’re willing to take your chances and open it. It’s from a trusted source, and it was created recently enough that the format probably hasn’t changed. How do you open it?
Unpickling
The following code will open the file data.pickle
and read it into an object obj
.
import pickle obj = pickle.load(open("data.pickle", "rb"))
If the object in the pickle file is very small, you could simply print obj
. But if the object is at all large, you probably want to save it to a file rather than dumping it at the command line, and you also want to “pretty” print it than simply printing it.
Pretty printing
The following code will dump a nicely-formatted version of our pickled object to a text file out.txt
.
import pickle import pprint obj = pickle.load(open("sample_data.pickle", "rb")) with open("out.txt", "a") as f: pprint.pprint(obj, stream=f)
In my case, the client’s file contained a dictionary of lists of dictionaries. It printed as one incomprehensible line, but it pretty printed as 40,000 readable lines.
Prettier printing
Simon Brunning left a comment suggesting that the json
module output is even easier to read.
import json with open("out.txt", "a") as f: json.dump(obj, f, indent=2)
And he’s right, at least in my case. The indentation json.dump
produces is more what I’d expect, more like what you’d see if you were writing the structure in well-formatted source code.
You might find this more readable still:
Thanks, Simon. You’re right. I updated the post to include your suggestion.