Question
What is the internal implementation of `copy.deepcopy()` in Python and how to override `__deepcopy__()` correctly?
When reading Antony Hatchkins' answer to "How to override the copy/deepcopy operations for a Python object?", I am confused about why his implementation of __deepcopy()__
does not check memo
first for whether the current object is already copied before copying the current object. This is also pointed out in the comment by Antonín Hoskovec. Jonathan H's comment also addressed this issue and mentioned that copy.deepcopy()
appears to abort the call to __deepcopy()__
if an object has already been copied before. However, he does not point out clearly where this is done in the code of copy
module.
To illustrate the issue with not checking memo
, suppose object a
references b
and c
, and both objects b
and c
references object d
. During a deepcopy of a
, object d
should be only copied once during the copy of b
or c
, whichever comes first.
Essentially, I am asking the rationale for why Antony Hatchkins' answer does not do the following:
from copy import deepcopy
class A:
def __deepcopy__(self, memo):
# Why not add the following two lines?
if id(self) in memo:
return memo[id(self)]
cls = self.__class__
result = cls.__new__(cls)
memo[id(self)] = result
for k, v in self.__dict__.items():
setattr(result, k, deepcopy(v, memo))
return result
Therefore, it would be great if someone can explain the internal implementation of deepcopy()
in the copy
module both to demonstrate the best practice for overriding __deepcopy__
and also just to let me know what is happening under the hood.
I took a brief look at the source code for copy.deepcopy()
but was confused by things like copier
, reductor
, and _reconstruct()
. I read answers like deepcopy override clarification and In Python, how can I call copy.deepcopy in my implementation of deepcopy()? but none of them gave a comprehensive answer and rationale.