cpython round(): the internals
Inspiration
A few days back, I came across a similar version of this tweet (can’t find that particular one now) which mentions that python’s round() function rounds to nearest even integer in case of mid-point integers(0.5).
The following examples explain the above tweet:
>>> round(2.5)
2
>>> round(1.5)
2
I thought of going into the internals of this in cpython and see how its implemented. Lets go through it together —
A bit of a personal story — A dream of mine has always been to work on core C/C++ but haven’t gotten a chance at work and the only real personal project I ever made was a Text Editor in C which was long time back. I always start projects in C but things remain un-finished.
So I thought of starting simple this time and just reading the idiomatic C language to get better at it — reading code if not writing it.
Where to begin ?
So, first things first, where to begin in the cpython code base ?
The python round
function is an inbuilt method for float
primitive datatype.
Taking a step back and mapping this to the cpython code, the primitive data types of python are implemented as PyObject
structures in C language.
The float data type is *declared* in C header file floatobject.c.h
at https://github.com/python/cpython/blob/main/Objects/clinic/floatobject.c.h
And the corresponding *definition* lies in https://github.com/python/cpython/blob/main/Objects/floatobject.c
Declarations
Not going to the details of float
but the specific function we are looking for is declared as follows in the same file at lines: https://github.com/python/cpython/blob/88dc84bcf9fef32afa9af0ab41fa467c9733483f/Objects/clinic/floatobject.c.h#L79-L111
#define FLOAT___ROUND___METHODDEF \
{"__round__", _PyCFunction_CAST(float___round__), METH_FASTCALL, float___round____doc__},
static PyObject *
float___round___impl(PyObject *self, PyObject *o_ndigits);
static PyObject *
float___round__(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
PyObject *return_value = NULL;
PyObject *o_ndigits = Py_None;
if (!_PyArg_CheckPositional("__round__", nargs, 0, 1)) {
goto exit;
}
if (nargs < 1) {
goto skip_optional;
}
o_ndigits = args[0];
skip_optional:
return_value = float___round___impl(self, o_ndigits);
exit:
return return_value;
}
Lets dive deep into this code line by line:
#define FLOAT___ROUND___METHODDEF \
{"__round__", _PyCFunction_CAST(float___round__), METH_FASTCALL, float___round____doc__},This method just mapps the magic method __round__ of float datatype to float__round___ method defined in the next lines.
So, without looking into other details of float
datatype, it becomes fair to assume (yeah, I am not an expert) that round()
method is implemented as magic method of float
datatype internally implemented as a PyObject.
Actually, we can invoke round(4.5)
method on floats also by calling float.__round__(4.5)
and its equivalent.
static PyObject *
float___round___impl(PyObject *self, PyObject *o_ndigits);
This is just a declration of main implementation of round()
method which I will cover in *Definition* section.
static PyObject *
float___round__(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
PyObject *return_value = NULL;
PyObject *o_ndigits = Py_None;
if (!_PyArg_CheckPositional("__round__", nargs, 0, 1)) {
goto exit;
}
if (nargs < 1) {
goto skip_optional;
}
o_ndigits = args[0];
skip_optional:
return_value = float___round___impl(self, o_ndigits);
exit:
return return_value;
}
This code is just for validation checks. So python round()
accepts an optional second argument which takes in the number of digits to round off to. The first argument is the floating number itself and second one if not provided is taken as 0.
The PyArg_CheckPositional
just checks that the number of arguments to function __round__
should be in minimum 0 and maximum 1. This method is in turn defined in another file.
The number of digits is used to initialise internal PyObject
, which if not present renains as Py_None
i.e. internal implemenation of None
.
And here as we see, the main implementation function float__round__impl
is called which is implemened in .c file.
Definition
Lets dive deep into the core implemetation in floatobject.c —
static PyObject *
float___round___impl(PyObject *self, PyObject *o_ndigits)
/*[clinic end generated code: output=374c36aaa0f13980 input=fc0fe25924fbc9ed]*/
{
double x, rounded;
Py_ssize_t ndigits;
x = PyFloat_AsDouble(self);
if (o_ndigits == Py_None) {
/* single-argument round or with None ndigits:
* round to nearest integer */
rounded = round(x);
if (fabs(x-rounded) == 0.5)
/* halfway case: round to even */
rounded = 2.0*round(x/2.0);
return PyLong_FromDouble(rounded);
}
/* interpret second argument as a Py_ssize_t; clips on overflow */
ndigits = PyNumber_AsSsize_t(o_ndigits, NULL);
if (ndigits == -1 && PyErr_Occurred())
return NULL;
/* nans and infinities round to themselves */
if (!isfinite(x))
return PyFloat_FromDouble(x);
/* Deal with extreme values for ndigits. For ndigits > NDIGITS_MAX, x
always rounds to itself. For ndigits < NDIGITS_MIN, x always
rounds to +-0.0. Here 0.30103 is an upper bound for log10(2). */
#define NDIGITS_MAX ((int)((DBL_MANT_DIG-DBL_MIN_EXP) * 0.30103))
#define NDIGITS_MIN (-(int)((DBL_MAX_EXP + 1) * 0.30103))
if (ndigits > NDIGITS_MAX)
/* return x */
return PyFloat_FromDouble(x);
else if (ndigits < NDIGITS_MIN)
/* return 0.0, but with sign of x */
return PyFloat_FromDouble(0.0*x);
else
/* finite x, and ndigits is not unreasonably large */
return double_round(x, (int)ndigits);
#undef NDIGITS_MAX
#undef NDIGITS_MIN
}
One thing to note before we dive into the code would be that anything we receive from the userland python code into the C code is not mapped to primitive datatypes in C. For instance, the simple python number received from user as number of digits to round off to is mapped to Py_Number
internally and similarly the input float is mapped to PyFloat
internally.
x = PyFloat_AsDouble(self);
This function begings by converting the floating number received from userland python code as a PyObject
to double
datatype in C.
From here, 2 cases emerge —with and without the num_digits to round off.
if (o_ndigits == Py_None) {
/* single-argument round or with None ndigits:
* round to nearest integer */
rounded = round(x);
if (fabs(x-rounded) == 0.5)
/* halfway case: round to even */
rounded = 2.0*round(x/2.0);
return PyLong_FromDouble(rounded);
}
When number of digits is not supplied ( remember from the Declaration section above that o_digits
was passed as Py_None
in this case), we just use the round()
function from C language’s <math.h>
library.
Now comes the catch !! We want to round to nearest EVEN and C’ round function which always rounds forward → round(2.5) = 3
!
This is not done in every case but only when we are at mid-point i.e. the floating number contains 0.5 at the end (we will look at complex cases later).
if (fabs(x-rounded) == 0.5)
/* halfway case: round to even */
rounded = 2.0*round(x/2.0);
Lets do some maths for 2 numbers to understand why this works
- round(1.5 / 2) = round(0.75) = 1
- round(2.5 / 2) = round(1.25) = 1
What we observe —
For even numbers and 0.5 decimals, rounding after dividing by 2, moves rounded number backward because of 0.25
For odd numbers and 0.5 decimals, rounding after dividing by 2, moves rounded number forward because of 0.75 in the decimal place.
This rounded number when re-multiplied by 2 gives the proper even-rounded number ( we already moved forward/backward as required based on the odd/even number before division by 2 ). So final, maths becomes:
- round(1.5 / 2) = round(0.75) = 1*2 = 2
- round(2.5 / 2) = round(1.25) = 1*2 = 2
This is pretty neat. One small detail here is also about the choice of 2.0
to divide x
and not just 2
. Had it been just 2, the C Language would have converted the final result to integer while we want the floating point number and hence that empty decimal 0
in 2.0
becomes crucial.
—
Then at the end, this C language’s double
variable is converted back to python object — PyLong
to return back to userland using PyLong_FromDouble()
method !
We seem good here ! Achieved the feat !!
But lets not stop here and move on to complete the python round()
function’s implemenation. It gets more interesting on how the num_digits
case is implemented:
/* interpret second argument as a Py_ssize_t; clips on overflow */
ndigits = PyNumber_AsSsize_t(o_ndigits, NULL);
if (ndigits == -1 && PyErr_Occurred())
return NULL;
/* nans and infinities round to themselves */
if (!isfinite(x))
return PyFloat_FromDouble(x);
Next up, just like floating number, the num_digits
variable is also converted to size_t
datatype from PyNumber and error checks are done.
Thereafter, the case of “infinite” as input number is handled to return the same infinite number back.
This is an interesting case which is only handled in case num_rounding_digits
is provided, else the implementation throws errors as it goes into the previous if
block we handled for the case without any rounding digits.
Its basically a way of saying that infinity is such a large number that rounding to 1 digit will still keep it infinite ( back to basics math got reminded to me as I was thinking through this ) and without specifying any rounding digits means we want an integer value out of infinity which is a valid erroneous case !
>>> round(float('inf'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: cannot convert float infinity to integer
>>> round(float('inf'),1)
inf
Moving on…. some extreme values of num_rounding_digits
are handled and its interesting (and I am not sure if I got this right , please comment if not)—
/* Deal with extreme values for ndigits. For ndigits > NDIGITS_MAX, x
always rounds to itself. For ndigits < NDIGITS_MIN, x always
rounds to +-0.0. Here 0.30103 is an upper bound for log10(2). */
#define NDIGITS_MAX ((int)((DBL_MANT_DIG - DBL_MIN_EXP) * 0.30103))
#define NDIGITS_MIN (-(int)((DBL_MAX_EXP + 1) * 0.30103))
if (ndigits > NDIGITS_MAX)
/* return x */
return PyFloat_FromDouble(x);
else if (ndigits < NDIGITS_MIN)
/* return 0.0, but with sign of x */
return PyFloat_FromDouble(0.0*x);
else
/* finite x, and ndigits is not unreasonably large */
return double_round(x, (int)ndigits);
Not going into finer details but float point numbers are internally represented using mantissa(significant digits)
and exponent (2’s exponents)
.
NDIGITS_MAX
— represents the maximum number of significant digits that a postive double-precision floating-point number can hold.
Calculation (as per my understanding):
DBL_MANT_DIG - DBL_MIN_EXP
: This calculates the maximum number of digits in the mantissa(precision), taking into account the smallest possible exponent. ( Although I didn’t fully understand the subtraction part)* 0.30103
:log10(2) —
This multiplication approximately converts the number of digits in base 2 representation of a number to the number of digits in decimal representation (our inputnum_digits
is decimal).- This basically translates to
log10(2^(DBL_MANT_DIG - DBL_MIN_EXP))
Rationale: If we try to round a number to more digits than it can represent, the result will be the original number itself, as there’s no additional precision to be gained.
Hence, the if blocks exist to skip the rounding off if the num_rounding_digits
is greater than the max_digits
which the number can represent and we can just return the same number
Similarly handling the negative value of num_rounding_digits
(negatives to be covered later in detail) — if the value is less than (negative value) the number of significant digits(mantissa) in the number represented by the negated value of maximum exponent i.e.DBL_MAX_EXP
, rounding will make the number to 0, hence it is directly returned.
0 is returned with the same sign as original number — yes python has sign for 0 i.e. +0.0 and -0.0 — from a Google Search — Python allows for a sign with zero because in certain low-level computing systems, including how floating-point numbers are represented, there can be a distinction between a positive zero and a negative zero!.
—
For all the cases within the limits and bounds, we are just left to explore double_round
function now :
Poking into the code, there are actually 2 variations of this based on a MACRO
defined at compile time. One version uses string<->double
conversions and one which doesn’t and plays solely on numbers.For the purpose of this story and to prevent this from ever-extending, I am diving deep into the number based implementation here —
This is very interesting and prevents overflows beautifully. Lets dive deep !
static PyObject *
double_round(double x, int ndigits) {
double pow1, pow2, y, z;
if (ndigits >= 0) {
if (ndigits > 22) {
/* pow1 and pow2 are each safe from overflow, but
pow1*pow2 ~= pow(10.0, ndigits) might overflow */
pow1 = pow(10.0, (double)(ndigits-22));
pow2 = 1e22;
}
else {
pow1 = pow(10.0, (double)ndigits);
pow2 = 1.0;
}
y = (x*pow1)*pow2;
/* if y overflows, then rounded value is exactly x */
if (!isfinite(y))
return PyFloat_FromDouble(x);
}
else {
pow1 = pow(10.0, (double)-ndigits);
pow2 = 1.0; /* unused; silences a gcc compiler warning */
y = x / pow1;
}
z = round(y);
if (fabs(y-z) == 0.5)
/* halfway between two integers; use round-half-even */
z = 2.0*round(y/2.0);
if (ndigits >= 0)
z = (z / pow2) / pow1;
else
z *= pow1;
/* if computation resulted in overflow, raise OverflowError */
if (!isfinite(z)) {
PyErr_SetString(PyExc_OverflowError,
"overflow occurred during round");
return NULL;
}
return PyFloat_FromDouble(z);
}
Negative and Positive num_rounding_digits
Lets first get some clarity on how the num_rounding_digits
work.
As per the PyDocs, round()
function rounds a Python float to the closest multiple of 10**ndigits.
To explain in calculative terms:
num_rounding_digits > 0
, we shift the decimal bynum_rounding_digits
places to the right of current position to compute round(and then shift back to original).num_rounding_digits < 0
, we shift the decimal bynum_rounding_digits
places to the left of current position to compute round(and then shift back to original).
Lets take some examples:
# round is computed on 134.51 => 135 => shift_back => 1.35
>>> round(1.3451,2)
1.35
# round is computed on 13.451 => 13 => shift_back => 1.3
>>> round(1.3451,1)
1.3
# round is computed on 1.3451 => 1 => shift_back => 1.0
>>> round(1.3451,0)
1.0
# round is computed on 10.13451 => 10 => shift_back => 100
>>> round(101.3451,-1)
100.0
# round is computed on 12.213451 => 12.0 => shift_back => 1200
>>> round(1221.3451,-2)
1200.0
# round is computed on 122.13451 => 122.0 => shift_back => 1220
>>> round(1221.3451,-1)
1220.0
So, we basically multiply the original number by 10**num_rounding_digits
and then compute round and then perform reverse operation i.e. divide by the multiplication factor to get back in the scale.
Now the code becomes easy to understand:
if (ndigits > 22) {
/* pow1 and pow2 are each safe from overflow, but
pow1*pow2 ~= pow(10.0, ndigits) might overflow */
pow1 = pow(10.0, (double)(ndigits-22));
pow2 = 1e22;
}
else {
pow1 = pow(10.0, (double)ndigits);
pow2 = 1.0;
}
y = (x*pow1)*pow2;
/* if y overflows, then rounded value is exactly x */
if (!isfinite(y))
return PyFloat_FromDouble(x);
}
For the first part, if ndigits >=0
, a small detail here is that there are 2 parts based on value of ndigits
i.e. num_rounding_digits
:
Values greater 10²² can overflow in most cases ( depends on the size of data types used for storage by the underying architecture — for instance, a 64 bit signed long can store upto 10¹⁸ (nearly 2⁶³). Hence without risking the overflow by power computation of large numbers, two different power variables are considered.
Later these are still multiplied to keep things sane and which can overflow ( as mentioned in the comment ) i.e. the initial split of exponent i.e. num_rounding_digits > 22
was mainly to prevent overflow at the time of power computation. The overflow in later multiplication is gracefully handled by the if
statement with the condition to return the original float back if the decimal shift to the right ( multiplication with 10s powers) overflows ( just like we do for infinites — very large numbers).
Then rest of the maths is quite simple and goes as per explanation above —
- → multiply with 10s powers to shift decimal
- → round using
<math.h>
round - → bring back to scale by dividing by original shifting constant
else {
pow1 = pow(10.0, (double)-ndigits);
pow2 = 1.0; /* unused; silences a gcc compiler warning */
y = x / pow1;
}
A small detail about negative values of num_rounding_digits
is that we divide instead of multiplying (we know computers are bad with floating point arithmetic and hence dealing with only positive integer exponents is a sane decision anytime)
Continuing further….
if (fabs(y-z) == 0.5)
/* halfway between two integers; use round-half-even */
z = 2.0*round(y/2.0);
As for this part goes, this is what this story started from (that tweet), this case actually tests if we were at the mid-point of the integers and if so, we should round it to the EVEN side.
We have already seen the logic of this with example in previous section where num_rounding_digits
was not specified and the intution behind this is also shared there. Please check out !
So. moving on…
/* if computation resulted in overflow, raise OverflowError */
if (!isfinite(z)) {
PyErr_SetString(PyExc_OverflowError,
"overflow occurred during round");
return NULL;
}
At last there are again some overflow checks to be sure that the output number is finite ( note that we have already handled the cases of infinite numbers before where we directly return infinite back to userland ).
Tests & Caveats
Tests with floats are hard to get right and there are number of Stack Overflow questions regarding round not behaving the way it should, for instance:
Round doesn’t do round(5.55,1)
correctly: https://stackoverflow.com/questions/56820/round-doesnt-seem-to-be-rounding-properly
We expect the rounding to even to handle correctly and in our case the condition that governs the even rounding is fabs(y-z) == 0.5
but what if the floating point numbers are not stored in correct precision.
Its an old old problem depicted in StackOverflows → this and this where it can happen that the number say 5.55
is stored as 5.4999999999
somthing and in that case the difference will never come out to be 0.5 and hence the rounding can turn to be wrong ( not the even rounding ).
Lets try with some tricky examples here (hoping floating point arithmetic takes our side here):
I really freaked out when some examples I tried didn’t work as intended and I couldn’t believe the code I read (also executed in isolation) and explained above doesn’t conform to its working , but then luckily realised the floating point arithmetic curse as explained abo)
I have seen that simple examples work but adding some large numbers with heavy decimals usually doesn’t (maybe a co-incidence on my machine ?)
# +ve round off
# 1.65 => shift to 16.5 => 17 ( diff = 0.5 ) => even_rounding => 16 => scale back to 1.6
# even_rounding => 2*round(16.5/2.0) => 2*round(8.25) = 2*8 = 16
>>> round(1.65,1)
1.6
# 1.55 => shift to 15.5 => 16 ( diff = 0.5 ) => even_rounding => 16 => scale back to 1.6
# even_rounding => 2*round(15.5/2.0) => 2*round(7.75) = 2*8 = 16
>>> round(1.55,1)
1.6
# -ve round off
# 25.0 => shift to 2.50 => 3.0 ( diff = 0.5 ) => even_rounding => 2.0 => scale back to 20
# even_rounding => 2*round(2.5/2.0) => 2*round(1.25) = 2*1 = 2
>>> round(25.0,-1)
20.0
# 15.0 => shift to 1.50 => 2.0 ( diff = 0.5 ) => even_rounding => 2.0 => scale back to 20
# even_rounding => 2*round(1.5/2.0) => 2*round(0.75) = 2*1 = 2
>>> round(15.0,-1)
20.0
To avoid such unknowns of floating point arithmetics, it is often advisable to use https://docs.python.org/3/library/decimal.html module for exact decimal values.
Just an opinion, didn’t go in details but maybe it uses strings for storing floats ? Need to explore!. Also the alternate version of double_round
mentioned above also seemed to be doing some string<->double
comparisons which can maybe also reduce the uncertainty ? Again , to be explored in another story!
—
Thanks for reading !
References:
cpython code: https://github.com/python/cpython