Shallow vs Deep Copy

There are two types of copy operations in Python: shallow copy and deep copy. Generally speaking,

  • Shallow copy creates a reference of objects to the original memory and it reflects changes made to the new copied object in the original object. It is faster.
  • Deep copy copies the memory of objects and hence the object values. It does not reflect changes made to the new copied object in the original object and is comparatively slower.

Shallow copy and deep copy can be called by the copy module. For example,

import copy
copy.copy(x)        # return a shallow copy of x
copy.deepcopy(x)    # return a deep copy of x

Built-In Copy and Assignment

Unless necessary, the most used copy operation is shallow copy. However, it is a bit cumbersome to call the copy module for copy operations every time. We can object built-in functions to perform shallow copy operations. Before that, we need to know what are mutable and immutable objects.

  • Mutable objects are thoes that can be changed after they are created. e.g., lists, dictionaries, class objects.
  • Immutable objects are thoes that cannot be changed after they are created. e.g., strings, integers, floats, tuples.

To better understand copy operations, we see the following example. The same works for dictionaries.

a = [1, [2,3], 'a']
b = a.copy()
c = copy.deepcopy(a)
b[0] = 0
b[1][0] = 0
print(a)    # a = [1, [0,3], 'a']
print(b)    # b = [0, [0,3], 'a']
print(c)    # c = [1, [2,3], 'a']

The built-in copy function makes a shallow copy of the list a and obtain a a new list with references to the same elements that were in the original list. Note that the elements of a were not copied. However, integers are immutable. Therefor, even though we change b[0] = 1, a[0] does not change because it is an integer. However, b[1] is a list which is mutable. Therefore, if we change b[1][0], the same change reflects in a. A deep copy c does not have this issue since a and c have different memory locations. An execellent explanation can be found in this discussion.

The asignment operator = also seems to perform a “copy”. In fact, it creates a reference to the old object, which is not a copy. For example,

a = 1
b = a
b = 2
print(a, b)     # a = 1, b = 2	
a = [1, 2, 3]
b = a
b[0] = 0
print(a, b)     # a = [0,2,3], b = [0, 2, 3]

In the first example, since a is an integer and immutable, changing b does not affect a. In the second example, since a is a list and mutable, the changes in b reflects in a.

We refer to the post Python: Assignment vs Shallow Copy vs Deep Copy for more detailed information.

Numpy Copy

For numpy arrays, the built-in np.copy() performs a shallow copy. The copy rule is similar as copying a list with all integers.

a = np.array([1, 2, 3])
b = np.copy(a)
b[0] = 0
print(a, b)     # a = [1,2,3], b = [0,2,3]

For object arrays, they are similar to nested lists. Any modifications to the nested mutable objects will reflects to the original objects.

a = np.array([1, [2,3], 'a'], dtype=object)
b = a.copy()
b[0] = 0
b[1][0] = 0	
print(a, b)     # a = [1, [0,3], 'a'], b = [0, [0,3], 'a']