Serialization and Persistence of Python Objects - Part 1

Serialization and Persistence of Python Objects

Image by AcatXIo from Pixabay

Python has some build-in modules that offers us to serialize and persists python object into a file and also allows us to de-serialize and use them from the file. Those modules are handy and very useful while writing a python script. They gives us a nice interface that is very easy to use and doesn’t require much effort to learn them. In this article we will explore them and try to find out how to use them.

It will be a two part series, in the first part (this article) we will be discussing python’s pickle and shelve module and in the second part we will be discussing the marshal and dbm module.

Pickle module

Python’s pickle module implements algorithm to serialize and de-serialize Python objects structure. In the pickling process Python’s object hierarchy are converted into byte stream and in the unpickling process Python’s object hierarchy are reconstructed back from a byte stream, usually from a binary file or byte like object.

The data format used by pickle module is specific to Python. Thus we cannot reconstruct the objects from non Python programs. There is a module pickletools that can be used to analyze data stream generated by pickle. There are 6 protocols used while pickling. The highest the protocol is, the newer version of Python required to read the pickle produced.

To serialize an object hierarchy we need to call the dumps function. It will return the byte data stream. Similarly, to de-serialize the data stream need to call the loads function. We can use the functions dump and load that will write and read the data strem in a file.

Here is an example to write in a file,

import pickle

class Employee:
    def __init__(self, name, department):
        self.name = name
        self.department = department
    
    def get_department(self):
        return f"Department is {self.department}"
    
    def __str__(self):
        return f"Name: {self.name}, Department: {self.department}"

data = {
    'a': [1, 2.0, 3+4j],
    'b': ("string value", b"byte string"),
    'c': {True, False, None},
    'emp1': Employee("Kawsar Ahmed", "Engineering")
}

with open('data.pickle', 'wb') as file:
    pickle.dump(data, file, pickle.HIGHEST_PROTOCOL)

To read from the file,

import pickle

class Employee:
    def __init__(self, name, department):
        self.name = name
        self.department = department
    
    def get_department(self):
        return f"Department is {self.department}"
    
    def __str__(self):
        return f"Name: {self.name}, Department: {self.department}"

with open('data.pickle', 'rb') as f:
    data = pickle.load(f)

print(data["a"])
print(data["b"])
print(data["c"])

emp1 = data["emp1"]
print(emp1)
print(emp1.get_department())

# [1, 2.0, (3+4j)]
# ('string value', b'byte string')
# {False, True, None}
# Name: Kawsar Ahmed, Department: Engineering
# Department is Engineering

From the Python documentation the following types can be pickled:

  • built-in constants (None, True, False, Ellipsis, and NotImplemented);
  • integers, floating-point numbers, complex numbers;
  • strings, bytes, bytearrays;
  • tuples, lists, sets, and dictionaries containing only picklable objects;
  • functions (built-in and user-defined) accessible from the top level of a module (using def, not lambda);
  • classes accessible from the top level of a module;
  • instances of such classes whose the result of calling getstate() is picklable.

When tried to pickle an unpickable objects it will raise PicklingError exception.

One caution to note that, do not load and unpickle data from the internet unless you trust the source. There is no way to know what is inside the pickled data untill running it. And if the pickled data contains a shell command then it could have access to your computer.


Shelve module

Shelve is a persistent data storage that saves data into a file. A shelf is a dictionary like object, which means we can perform similar functionality with shelf objects that we normally do with dictionaries. Only the exceptions are on copying, constructors and the merge operation (| or |=), other then that we can use all the dictionary methods and operations with shelf object. Shelve is very similar to dbm databases, only difference is that the values can be anything that the pickle module can handle for shelve. The keys are ordinary strings.

Shelve provides a context manager shelve.open(filename) that we can use to open a file and store the data as key value pairs. We can use open as function call as well but in that case we have to call close function manually to close a shelf. Under the hode, shelve uses pickle module to serialize and de-serailize the objects. Let’s see an example where we will store and read an integer, a string and a class instance using shelve.

Writing in the database file,

import shelve

class Human:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def get_name(self):
        return f"The name is {self.name}"
    
    def __str__(self):
        return f"Name: {self.name}, Age: {self.age}"

with shelve.open("database") as db:
    db["a"] = "A string"
    db["b"] = 1234
    db["naim"] = Human("Naim", 10)

Reading from the database file,

import shelve

class Human:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def get_name(self):
        return f"The name is {self.name}"
    
    def __str__(self):
        return f"Name: {self.name}, Age: {self.age}"

with shelve.open("database") as db:
    print(db["a"])
    print(db["b"])

    naim = db["naim"]
    print(naim)
    print(naim.get_name())

# A string
# 1234
# Name: Naim, Age: 10
# The name is Naim

One thing to note that, while serializing or persisting a class object either with the shelve modue or the pickle, we have to make sure that it has access to the class definitation of the object, otherwise it will through an error.

Click here for the second part of the article series.


References