Python is an open source language, and you may view its source code on Github.
Here, we will walk through some C source code to discover how Python implements the append() method for Lists.
Its source code is in listobject.c. Its fairly short, so let’s take a look, keeping in mind that the numbers inside parenthesis on the left edge are just place markers and do not appear in the actual C code:
int
PyList_Append(PyObject *op, PyObject *newitem)
{
if (PyList_Check(op) && (newitem != NULL))
(1) return app1((PyListObject *)op, newitem);
PyErr_BadInternalCall();
return -1;
}
Internally, it’s called PyList_Append and it returns an integer, which is used as a success vs. failure flag. But for now, let’s skip going down that rabbit hole, and instead focus on how it adds an element to a List.
We can see that the real work seems to be done by app1() (1) which takes two arguments:
- a pointer to the List object,
(PyListObject *)op - the item which will be appended,
newitem
So, let’s examine app1() further:
static int
app1(PyListObject *self, PyObject *v)
{
Py_ssize_t n = PyList_GET_SIZE(self);
assert (v != NULL);
(1) if (n == PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"cannot add more objects to list");
return -1;
}
(2) if (list_resize(self, n+1) < 0)
return -1;
Py_INCREF(v);
(3) PyList_SET_ITEM(self, n, v);
return 0;
}
First (1), it checks that the list’s size is not larger than what the platform can handle, for example a 64bit computer vs. a 32bit.
Then (2) it calls list_resize(), which is defined in the same file, listobject.c. This function does several things at the same time:
- It checks to see if a resize is needed
- If it can’t resize, then
app1()returns a failure flag - If it did resize or even didn’t need to, it lets the rest of
app1()happily execute by adding the new element (3).
But, what exactly is it resizing? Let’s examine the C source code for list_resize() to find out:
/* Ensure ob_item has room for at least newsize elements, and set
* ob_size to newsize. If newsize > ob_size on entry, the content
* of the new slots at exit is undefined heap trash; it's the caller's
* responsibility to overwrite them with sane values.
* The number of allocated elements may grow, shrink, or stay the same.
* Failure is impossible if newsize <= self.allocated on entry, although
* that partly relies on an assumption that the system realloc() never
* fails when passed a number of bytes <= the number of bytes last
* allocated (the C standard doesn't guarantee this, but it's hard to
* imagine a realloc implementation where it wouldn't be true).
* Note that self->ob_item may change, and even if newsize is less
* than ob_size on entry.
*/
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
PyObject **items;
size_t new_allocated, num_allocated_bytes;
Py_ssize_t allocated = self->allocated;
/* Bypass realloc() when a previous overallocation is large enough
to accommodate the newsize. If the newsize falls lower than half
the allocated size, then proceed with the realloc() to shrink the list.
*/
(1) if (allocated >= newsize && newsize >= (allocated >> 1)) {
assert(self->ob_item != NULL || newsize == 0);
Py_SET_SIZE(self, newsize);
return 0;
}
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* Add padding to make the allocated size multiple of 4.
* The growth pattern is: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ...
* Note: new_allocated won't overflow because the largest possible value
* is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
*/
new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
/* Do not overallocate if the new size is closer to overalocated size
* than to the old size.
*/
if (newsize - Py_SIZE(self) > (Py_ssize_t)(new_allocated - newsize))
new_allocated = ((size_t)newsize + 3) & ~(size_t)3;
if (newsize == 0)
new_allocated = 0;
num_allocated_bytes = new_allocated * sizeof(PyObject *);
items = (PyObject **)PyMem_Realloc(self->ob_item, num_allocated_bytes);
if (items == NULL) {
PyErr_NoMemory();
(2) return -1;
}
self->ob_item = items;
Py_SET_SIZE(self, newsize);
self->allocated = new_allocated;
(2) return 0;
}
The very polite writer(s) of this code have commented very nicely about what list_resize() is doing and why.
I’ll let the writers speak for themselves, but will also note a few things:
(1) allocated >>; 1: In C, bit shifting is sometimes used as a shortcut to either multiply or divide by 2.
CS271 may shed more light on that.
(2) Returning -1 often flags an error, while returning zero (0) means it succeeded!