Python Source Code Analysis: Where does `self` go when monkey patch a class member function to `eval()`?

In the process of getting a shell in Python, we can hijack a member function of a class and turn it into eval(). At first glance, everything seems fine. But upon closer examination, when calling a member function of a class, isn’t self passed as an argument, like func(clazz.self, parameter)? Then why isn’t self being passed as the first argument to eval(), and why isn’t there an error?

To truly understand the situation, let’s take a look at the Python 3.7.8 source code. Configuring the environment for debugging Python source code is similar to debugging PHP source code, so you can refer to relevant articles on setting up a PHP source code debugging environment.

Pro tip: The “Doc” folder in the Python source code contains official documentation (some .rst files). If you don’t know the purpose of a function in the source code, you can search and view it in the Doc folder. In CLion, you can use Ctrl+Shift+F to perform a full search, covering the entire project and even library functions’ source code.

Example

Let’s take a look at the following code. We know that the definition of eval() is eval(expression[, globals[, locals]]).

If we put eval() in __eq__, when executing a=="bb", the expression should be self, and the globals should be "bb". If that’s the case, there will definitely be an error and the execution cannot continue:

1
2
3
4
5
6
7
8
9


class A():
    pass

if __name__ == "__main__":
    a = A()
    a.__class__.__eq__ = eval
    print(a)
    print(eval)
    a == "bb"

However, in reality, why is there no error when putting a.__class__.__eq__ = eval in __eq__, and it executes normally instead?

Analysis

0x01 builtin_eval

In the Python language, eval() belongs to python’s builtin_function. Its implementation is in builtin_eval.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


static PyObject *
builtin_eval(PyObject *module, PyObject *const *args, Py_ssize_t nargs)
{
    PyObject *return_value = NULL;
    PyObject *source;
    PyObject *globals = Py_None;
    PyObject *locals = Py_None;

    if (!_PyArg_UnpackStack(args, nargs, "eval",
        1, 3,
        &source, &globals, &locals)) {
        goto exit;
    }
    return_value = builtin_eval_impl(module, source, globals, locals);

exit:
    return return_value;
}

So, let’s set a breakpoint on this function and see the call stack.

When evaluating a == 'bb', because the do_richcompare function is triggered during the == comparison and op=2 indicates that == is being performed.

In the design philosophy of the Python language, an object has many “slots”, such as __str__ which is a slot function that can be overridden. __eq__ is also one of them.

https://docs.python.org/3.8/c-api/typeobj.html?highlight=slots

a.__class__.__eq__ = eval, so it can be understood that eval is placed in the slot corresponding to eq, and this is how it enters slot_tp_richcompare.

If eval is not placed, then Python would perform the comparison according to the normal process during richcompare.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


static PyObject *
slot_tp_richcompare(PyObject *self, PyObject *other, int op)
{
    int unbound;
    PyObject *func, *res;

    func = lookup_maybe_method(self, &name_op[op], &unbound);
    if (func == NULL) {
        PyErr_Clear();
        Py_RETURN_NOTIMPLEMENTED;
    }

    PyObject *args[1] = {other};
    res = call_unbound(unbound, func, self, args, 1);
    Py_DECREF(func);
    return res;
}

static _Py_Identifier name_op[] = {
    {0, "__lt__", 0},
    {0, "__le__", 0},
    {0, "__eq__", 0},
    {0, "__ne__", 0},
    {0, "__gt__", 0},
    {0, "__ge__", 0}
};

lookup_maybe_method extracts the eval in __eq__, then executes it using call_unbound.

But notice that self is still passed into call_unbound. So, where is self being discarded?

Because unbound=0, self is discarded here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


static PyObject*
call_unbound(int unbound, PyObject *func, PyObject *self,
             PyObject **args, Py_ssize_t nargs)
{
    if (unbound) { //unbound = 0
        return _PyObject_FastCall_Prepend(func, self, args, nargs);
    }
    else {
        return _PyObject_FastCall(func, args, nargs);
    }
}

Now that we know where self is being discarded, let’s dig deeper and find out how unbound=0 is set. Let’s continue reading:

0x02 unbound

By following the code, we can find _PyObject_FastCallDict(), which calls _PyCFunction_FastCallDict(), and this CFunction is indeed the eval we are looking for. Then, we enter the execution of builtin_eval().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


PyObject *
_PyObject_FastCallDict(PyObject *callable, PyObject *const *args, Py_ssize_t nargs,
                       PyObject *kwargs)
{
    /* _PyObject_FastCallDict() must not be called with an exception set,
       because it can clear it (directly or indirectly) and so the
       caller loses its exception */
    assert(!PyErr_Occurred());

    assert(callable != NULL);
    assert(nargs >= 0);
    assert(nargs == 0 || args != NULL);
    assert(kwargs == NULL || PyDict_Check(kwargs));

    if (PyFunction_Check(callable)) {
        return _PyFunction_FastCallDict(callable, args, nargs, kwargs);
    }
    else if (PyCFunction_Check(callable)) {
        return _PyCFunction_FastCallDict(callable, args, nargs, kwargs);
    }

So, how does unbound=0 come about? Let’s see what lookup_maybe_method does.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


static PyObject *
lookup_maybe_method(PyObject *self, _Py_Identifier *attrid, int *unbound)
{
    PyObject *res = _PyType_LookupId(Py_TYPE(self), attrid);
    //res = eval()".

    // Here I extract eval from __eq__, and attrid here is __eq__
    if (res == NULL) {
        return NULL;
    }

    if (PyFunction_Check(res)) {
        /* Avoid temporary PyMethodObject */
        *unbound = 1;
        Py_INCREF(res);
    }
    else {
        *unbound = 0;
        descrgetfunc f = Py_TYPE(res)->tp_descr_get;
        // descr descriptor tp_descr_get is to get the __get__ method inside a new-style class
        // In Python, if a new-style class defines one or more of __get__, __set__, __delete__ methods, then the descriptor mentioned here refers to the defined __get__, __set__, __delete__
        if (f == NULL) {
            Py_INCREF(res);
            // Increase the reference counter by 1
        }
        else {
            res = f(res, self, (PyObject *)(Py_TYPE(self)));
        }
    }
    return res;
}

Macro definitions related to PyFunction_Check:

1
2


#define PyFunction_Check(op) (Py_TYPE(op) == &PyFunction_Type)
#define Py_TYPE(ob)             (((PyObject*)(ob))->ob_type)

&PyFunction_Type can be understood as PyFunction_Type[0], the PyFunction_Type array:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


PyTypeObject PyFunction_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "function",
    sizeof(PyFunctionObject),
    0,
    (destructor)func_dealloc,                   /* tp_dealloc */
    0,                                          /* tp_print */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_reserved */
    (reprfunc)func_repr,                        /* tp_repr */
    0,                                          /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    0,                                          /* tp_hash */
    function_call,                              /* tp_call */
    0,                                          /* tp_str */
    0,                                          /* tp_getattro */
    0,                                          /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,    /* tp_flags */
    func_new__doc__,                            /* tp_doc */
    (traverseproc)func_traverse,                /* tp_traverse */
    0,                                          /* tp_clear */
    0,                                          /* tp_richcompare */
    offsetof(PyFunctionObject, func_weakreflist), /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    0,                                          /* tp_methods */
    func_memberlist,                            /* tp_members */
    func_getsetlist,                            /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
    func_descr_get,                             /* tp_descr_get */
    0,                                          /* tp_descr_set */
    offsetof(PyFunctionObject, func_dict),      /* tp_dictoffset */
    0,                                          /* tp_init */
    0,                                          /* tp_alloc */
    func_new,                                   /* tp_new */
};

The stuff in front of PyVarObject_HEAD_INIT(&PyType_Type, 0) "function" is a type conversion; ignore it.

What it means here is that the ob_type needs to be "function" for PyFunction_Check to return 1. Because the ob_type of eval is builtin_function_or_method, it will return 0.

This can be verified through a simple test. In the following example, the ob_type is function, and the return value of unbound is 1:

1
2
3


def hello(aa, bb):
    print(aa, bb)
a.__class__.__eq__ = hello

Then, we clearly didn’t define the __get__ for class A, so descrgetfunc = NULL. After that, lookup_maybe_method finishes, and it returns the eval, incidentally setting unbound = 0.

Conclusion

In studying web security, many language tricks might seem ordinary. However, it’s crucial to understand their underlying principles. Gaining insight into the rationale behind these tricks is often more rewarding than merely memorizing them.

When reading the source code, it’s often helpful to refer to the official documentation. This assists in understanding the design philosophy, giving an overview of the architecture, and facilitates subsequent analysis."