# Floating point arithmetic: what's wrong with equality comparisons, workarounds, and some links.

23 Jan 2022I started off writing this post wanting to do some brief discussion of floating point arithmetic, inspired by what I am reading, Lecture 13 of Trefethen and Bau’s **Numerical Linear Algebra.** ISBN 9780898713619 (1997).

However, floating point arithmetic is a big topic and this post is instead going to be about a common programming assumption, that comparing floating point numbers with an equality operator is something that will work consistently in the way it will for integers and Boolean values (spoiler, it doesn’t).

In brief, floating point arithmetic on a computer is not the same as infinite precision arithmetic because of several reasons:

- roundoff error,
- difficulties with representing some numbers in binary, such as 0.1 (for more explanation, see isocpp).

## What are needed values for types on my computer?

To even talk about this issue, we’ll need to get some values on the machine / compiler combination we are using.

There’s access to several needed values through libraries such as `std::numeric_limits`

within C++, docs.

Functions that I used are:

`std::numeric_limits<T>::epsilon()`

,`std::numeric_limits<T>::round_style`

,`std::numeric_limits<T>::digits10`

,`std::numeric_limits<T>::max`

.

## Machine epsilon

I’ll talk a bit about machine epsilon, or \(\epsilon_{machine}\) in the equations. The machine epsilon defines the gaps between floating point numbers.

From Trefethen and Bau, they define a function \(f : \mathbb{R} \mapsto \mathbf{F}\) (where \(\mathbf{F}\) is the set of numbers represented by a particular type),

be a function giving the closest floating point approximation to a real number, its

roundedequivalent in the floating point system. …

For all \(x \in \mathbb{R}\), there exists \(\epsilon\) with \(\lvert \epsilon \rvert \leq \epsilon_{machine}\) such that \(f(x) = x(1 + \epsilon)\).

And from the docs for `std::numeric_limits<T>::epsilon()`

,

Returns the machine epsilon, that is, the difference between 1.0 and the next value representable by the floating-point type T.

Ok, so back to the original problem, how does machine epsilon help compare two floating point numbers? Stated another way, in the worst case the machine epsilon (or some epsilon smaller than the machine epsilon) is the resolution of the set of floating point numbers representable by a certain type.

You’ll notice from the docs, that it explicitly mentions “1.0 and the next value representable by the floating-point type T”. In the Trefethen and Bau definition, it is more obvious that the larger the magnitude of the floating point number, the larger the gaps between numbers. So this resolution – or gap between numbers – is not uniform.

Now on to some examples from my machine. The `float`

type epsilon `std::numeric_limits<float>::epsilon()`

is 0.00000011920928955078125. (By the way, use `#include <iomanip>`

and `std::setprecision()`

to see all the digits you want.) The `std::numeric_limits<float>::round_style`

is 1, which means, “rounding toward nearest representable value”.

The `std::numeric_limits<T>::round_error`

is 0.5, and by piecing together some commentary elsewhere and tests on my machine, one can expect in the worst case that:

all the numbers in between \(\left[1, (1 + \frac{\epsilon_{machine}}{2} ) \right)\) will be rounded to 1.0.

all the numbers in between \(\left[10, 10(1 + \frac{\epsilon_{machine}}{2}) = 10.00000095367431640625 \right)\) will be rounded to 10.0.

The reality is quite slippery – and part of what makes this topic so hard to write about – is that machine epsilon is the worst case gap between floating point numbers. Using my machine / compiler combination, \(10(1 + \frac{\epsilon_{machine}}{2})\) is not rounded, but \(10(1 + \frac{3\epsilon_{machine}}{8})\) is.

*Note, I only used whole numbers in these examples because of laziness, and for no other reason.*

## Practical note: Comparing floating point numbers with an equality operator is a bad, bad idea

All of this is to conclude that writing the following (C++):

```
float f0;
float f1;
...
if (f0 == f1){
cout << "Equal" << endl;
} else {
cout << "Not equal." << endl;
}
```

is almost never going to work as intended, because of roundoff, the number of bits for that type, binary representation, etc.

There are a few different workarounds.

### Workaround 1: user-defined tolerance.

The workaround is to define a small tolerance and compare the absolute difference of the numbers to the tolerance value.

```
float f0;
float f1;
float tolerance = 0.00001;
...
if (std::abs(f0 - f1) < tolerance){
cout << "Equal" << endl;
} else {
cout << "Not equal." << endl;
}
```

Of course this approach has its own problems, how do you set the tolerance, meaning, how far away should two numbers be to be considered equal? As usual, considering the application usually helps with these decisions.

### Workaround 2: use machine epsilon as the tolerance.

This function is adapted from isocpp. Here, the machine epsilon is used as the tolerance, and the tolerance scales with the magnitude of the first argument. Because of this choice of scaling, it is possible that `isEqual(x, y) != isEqual(y, x)`

.

```
template<class T>
inline bool isEqual(T x, T y)
{
const double epsilon = std::numeric_limits<T>::epsilon();
/* some small number such as 1e-5 */;
return std::abs(x - y) <= epsilon * std::abs(x);
// see Knuth section 4.2.2 pages 217-218
}
```

### Workaround 3: another variation on machine epsilon as part of the tolerance.

This code snippet is from CppPreference’s epsilon page. Here, they aren’t committing to `isEqual`

, instead this function is called `almost_equal`

. They cover some of the edge cases that `isEqual`

doesn’t above, by creating a tolerance that is the sum of the two floating point numbers, multiplied by epsilon, **and** multiplied by `ulp`

, which is ‘units in the last place’. The code example uses 2 for `ulp`

.

```
template<class T>
typename std::enable_if<!std::numeric_limits<T>::is_integer, bool>::type
almost_equal(T x, T y, int ulp)
{
// the machine epsilon has to be scaled to the magnitude of the values used
// and multiplied by the desired precision in ULPs (units in the last place)
return std::fabs(x-y) <= std::numeric_limits<T>::epsilon() * std::fabs(x+y) * ulp
// unless the result is subnormal
|| std::fabs(x-y) < std::numeric_limits<T>::min();
}
```

### Workaround 4: use an infinite-precision / arbitrary-precision arithmetic library.

Supposing you have a need to compare for equality, and you have rational numbers, an alternate to any of the above is to use an infinite-precision data type in the way CGAL (Computational Geometry Algorithms Library) does for some portions of their library. Note: there is a tradeoff; accuracy for speed. An example library is the GNU Multiple Precision Arithmetic Library, Wikipedia, library site.

### Workaround 5: design around equality comparisons of floating point numbers.

Another way of dealing with the problem of equality comparisons of floating point numbers is to *not do them*. Rethink the problem and the program such that the program does not depend on equality comparisons of floating-point numbers.

I had been introduced to numerical precision and why arbitrary precision arithmetic was important for computational geometry during graduate school. When I started really writing code differently and incorporating those ideas strictly was when I ran up against this problem in my own work; a function was not evaluating to the value it should have. Why? I was using a floating-point data type, with all of the accompanying floating-point precision problems.

The input set of numbers in my problem was bounded, so instead of using rational numbers I multiplied by a constant and had signed integers instead. Integers don’t have roundoff error! Well, except if you overflow, but that’s a different issue.

## Final thoughts.

Burkhart Stubert wrote a similar post in 2019, “Comparing Two Floating-Point Numbers”, in which he quantifies how well or not well different methods perform comparisons. I was not aware of his post when I first wrote this one, but he also advocated for workaround 5:

The best solution is not to use floating-point numbers at all. Fixed-point numbers are an excellent alternative. A fixed-point number is roughly an integer with a scaling factor (e.g., 100 for 2 decimal places).

## More reading.

- David Goldberg, 1991. “What every computer scientist should know about floating pt arithmetic.” link.
- Doug Priest. Addendum to the above, “Differences Among IEEE 754 Implementations.” link.
- Burkhart Stubert, “Comparing Two Floating-Point Numbers.” link.

Listing of Numerical Linear Algebra posts.