Friday, June 6, 2008

Avoiding another shared reference trap in Python

Define Foo as follows.

>>> class Foo( ):
... def __init__(self, stuff = [ ]):
... self.stuff = stuff
...

Then instantiate two instances of Foo and append a number to f1 and f2.

>>> f1 = Foo( )
>>> f2 = Foo( )
>>> f1.stuff.append(17)
>>> f2.stuff.append(18)

So, what do

>>> f1.stuff

and

>>> f2.stuff

return?

Before we get to the answer, stop and consider that f1 and f2 are totally separate instances Foo. Attributes that we assign to f1 and f2 are, usually, completely independent.

>>> f1.flavor = 'cherry'
>>> f2.flavor = 'lime'
>>> f1.flavor
'cherry'
>>> f2.flavor
'lime'

And so we would hope that f1 will maintain one list of stuff while f2 will maintain another.

But, perhaps surprisingly, f1 and f2 share a single list of stuff.

>>> f1.stuff
[17, 18]
>>> f2.stuff
[17, 18]

What's going on here?

The answer has to do with order of evaluation in Python. And also with the difference between class definition time and instance instantiation time.

Immediately following our definition of Foo ...

>>> class Foo( ):
... def __init__(self, stuff = [ ]):
... self.stuff = stuff
...

... the Python interpreter defines Foo. And it is at precisely this moment of class definition that Python first evaluates and then stores default values for all class methods.

This means that stuff to an empty list and then that __init__ stores a reference to that list.

Later -- after class definition -- we instantiate first one and then another instance of Foo. At these moments of instance instantiation, Python calls Foo's initializer ... but refuses to reevaluate default arguments in __init__.

What this means is that Python evaluates default arguments once and only once at least under normal conditions. And the consequence here is that f1 and f2 wind up, frustratingly, sharing a reference to the same list of stuff.

One good solution substitutes stuff = None for stuff = [ ].

>>> class Foo( ):
... def __init__(self, stuff = None):
... if stuff is None:
... stuff = [ ]
... self.stuff = stuff

And this highlights a principle that's may not really be a best practice in Python but might as well ought to be: set function defaults to immutable types, never to mutables. Lists, of course, are mutable, which helps explain why stuff = [ ] gets us into trouble. None is immutable and works great.

For a follow-up bit of exotica, stop and ask yourself where it is, exactly, that our first, list = [ ] definition of Foo stores its reference to its single list of stuff.

Turns out the the answer is here ...

>>> Foo.__init__.im_func.func_defaults

... buried down a couple of layers, but still open for inspection.

No comments: