Question

What is the internal implementation of `copy.deepcopy()` in Python and how to override `__deepcopy__()` correctly?

When reading Antony Hatchkins' answer to "How to override the copy/deepcopy operations for a Python object?", I am confused about why his implementation of __deepcopy()__ does not check memo first for whether the current object is already copied before copying the current object. This is also pointed out in the comment by Antonín Hoskovec. Jonathan H's comment also addressed this issue and mentioned that copy.deepcopy() appears to abort the call to __deepcopy()__ if an object has already been copied before. However, he does not point out clearly where this is done in the code of copy module.

To illustrate the issue with not checking memo, suppose object a references b and c, and both objects b and c references object d. During a deepcopy of a, object d should be only copied once during the copy of b or c, whichever comes first.

Essentially, I am asking the rationale for why Antony Hatchkins' answer does not do the following:

from copy import deepcopy

class A:
    def __deepcopy__(self, memo):
        # Why not add the following two lines?
        if id(self) in memo:
            return memo[id(self)]

        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, deepcopy(v, memo))
        return result

Therefore, it would be great if someone can explain the internal implementation of deepcopy() in the copy module both to demonstrate the best practice for overriding __deepcopy__ and also just to let me know what is happening under the hood.

I took a brief look at the source code for copy.deepcopy() but was confused by things like copier, reductor, and _reconstruct(). I read answers like deepcopy override clarification and In Python, how can I call copy.deepcopy in my implementation of deepcopy()? but none of them gave a comprehensive answer and rationale.

 3  73  3
1 Jan 1970

Solution

 3

The (reference) implementation for copy.deepcopy is here

As you can see, the firsts thing that function does is check for the instance in the memo, so no need to check in your own implementation.


Here is a breakdown of how that function works:

deepcopy(x, memo=None)

  1. checks if x is in the memo. If it is, return the value associated to it.

  2. tries to work out the copying method, by, in that order

    1. looking for it in the _deepcopy_dispatch dictionary
    2. checking if x has a __deepcopy__ method, and using that
    3. checking if it can be reduced (see here). Ie if it can be pickled. If that is the case, it basically runs that, copies the reduced object, and then unpickles it.
  3. runs the found method to create a copy

  4. registers that copy in the memo.

(I am ellipsing over some details, read the code if you are interesting in them)

So to answer your questions (and others you may have):

    • Q: what happens when you override __deepcopy__
    • A: It is called at step 3, instead of the default (unless there was a method in the _deepcopy_dispatch dictionary, but that dictionary should only contain methods for basic types)
    • Q: when does the recursivity happen
    • A: It happens when your __deepcopy__ function is called. This one should recursively call deepcopy with the same memo dictionary
    • Q: Why does Antony Hatchkins' implementation register the instance in memo if deepcopy function also does it (step 4)
    • A: because deepcopy registers the object in memo at the very end, whereas to avoid infinite recursion, you need to register it before doing recursive calls

Note: For a simpler way to allow your custom classes to be copied, you can also implement the __gestate__ and __setstate__ methods, and relying on the fact that deepcopy falls back on pickling methods

2024-07-07
tbrugere