At the intersection of Islamicate linguistics and computation

Exploring Python’s Implementation of List.append() in C

Python is an open source language, and you may view its source code on Github.

Here, we will walk through some C source code to discover how Python implements the append() method for Lists. Its source code is in listobject.c. Its fairly short, so let’s take a look, keeping in mind that the numbers inside parenthesis on the left edge are just place markers and do not appear in the actual C code:

int
PyList_Append(PyObject *op, PyObject *newitem)
{
    if (PyList_Check(op) && (newitem != NULL))
(1)     return app1((PyListObject *)op, newitem);
    PyErr_BadInternalCall();
    return -1;
}

Internally, it’s called PyList_Append and it returns an integer, which is used as a success vs. failure flag. But for now, let’s skip going down that rabbit hole, and instead focus on how it adds an element to a List.

We can see that the real work seems to be done by app1() (1) which takes two arguments:

  1. a pointer to the List object, (PyListObject *)op
  2. the item which will be appended, newitem

So, let’s examine app1() further:

static int
app1(PyListObject *self, PyObject *v)
{
    Py_ssize_t n = PyList_GET_SIZE(self);

    assert (v != NULL);
(1) if (n == PY_SSIZE_T_MAX) {
        PyErr_SetString(PyExc_OverflowError,
            "cannot add more objects to list");
        return -1;
    }

(2) if (list_resize(self, n+1) < 0)
        return -1;

    Py_INCREF(v);
(3) PyList_SET_ITEM(self, n, v);
    return 0;
}

First (1), it checks that the list’s size is not larger than what the platform can handle, for example a 64bit computer vs. a 32bit. Then (2) it calls list_resize(), which is defined in the same file, listobject.c. This function does several things at the same time:

  • It checks to see if a resize is needed
  • If it can’t resize, then app1() returns a failure flag
  • If it did resize or even didn’t need to, it lets the rest of app1() happily execute by adding the new element (3).

But, what exactly is it resizing? Let’s examine the C source code for list_resize() to find out:

/* Ensure ob_item has room for at least newsize elements, and set
 * ob_size to newsize.  If newsize > ob_size on entry, the content
 * of the new slots at exit is undefined heap trash; it's the caller's
 * responsibility to overwrite them with sane values.
 * The number of allocated elements may grow, shrink, or stay the same.
 * Failure is impossible if newsize <= self.allocated on entry, although
 * that partly relies on an assumption that the system realloc() never
 * fails when passed a number of bytes <= the number of bytes last
 * allocated (the C standard doesn't guarantee this, but it's hard to
 * imagine a realloc implementation where it wouldn't be true).
 * Note that self->ob_item may change, and even if newsize is less
 * than ob_size on entry.
 */
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    PyObject **items;
    size_t new_allocated, num_allocated_bytes;
    Py_ssize_t allocated = self->allocated;

    /* Bypass realloc() when a previous overallocation is large enough
       to accommodate the newsize.  If the newsize falls lower than half
       the allocated size, then proceed with the realloc() to shrink the list.
    */
(1) if (allocated >= newsize && newsize >= (allocated >> 1)) {
        assert(self->ob_item != NULL || newsize == 0);
        Py_SET_SIZE(self, newsize);
        return 0;
    }

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * Add padding to make the allocated size multiple of 4.
     * The growth pattern is:  0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ...
     * Note: new_allocated won't overflow because the largest possible value
     *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
     */
    new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
    /* Do not overallocate if the new size is closer to overalocated size
     * than to the old size.
     */
    if (newsize - Py_SIZE(self) > (Py_ssize_t)(new_allocated - newsize))
        new_allocated = ((size_t)newsize + 3) & ~(size_t)3;

    if (newsize == 0)
        new_allocated = 0;
    num_allocated_bytes = new_allocated * sizeof(PyObject *);
    items = (PyObject **)PyMem_Realloc(self->ob_item, num_allocated_bytes);
    if (items == NULL) {
        PyErr_NoMemory();
(2)     return -1;
    }
    self->ob_item = items;
    Py_SET_SIZE(self, newsize);
    self->allocated = new_allocated;
(2) return 0;
}

The very polite writer(s) of this code have commented very nicely about what list_resize() is doing and why. I’ll let the writers speak for themselves, but will also note a few things:

(1) allocated &gt;&gt;; 1: In C, bit shifting is sometimes used as a shortcut to either multiply or divide by 2. CS271 may shed more light on that.

(2) Returning -1 often flags an error, while returning zero (0) means it succeeded!