1 | Comparing Floating Point Numbers, 2012 Edition |
---|
2 | Posted on February 25, 2012 by brucedawson |
---|
3 | This post is a more carefully thought out and peer reviewed version of a floating-point comparison article I wrote many years ago. This one gives solid advice and some surprising observations about the tricky subject of comparing floating-point numbers. A compilable source file with license is available. |
---|
4 | |
---|
5 | We’ve finally reached the point in this series that I’ve been waiting for. In this post I am going to share the most crucial piece of floating-point math knowledge that I have. Here it is: |
---|
6 | |
---|
7 | |
---|
8 | [Floating-point] math is hard. |
---|
9 | |
---|
10 | You just won’t believe how vastly, hugely, mind-bogglingly hard it is. I mean, you may think it’s difficult to calculate when trains from Chicago and Los Angeles will collide, but that’s just peanuts to floating-point math. |
---|
11 | |
---|
12 | Seriously. Each time I think that I’ve wrapped my head around the subtleties and implications of floating-point math I find that I’m wrong and that there is some extra confounding factor that I had failed to consider. So, the lesson to remember is that floating-point math is always more complex than you think it is. Keep that in mind through the rest of the post where we talk about the promised topic of comparing floats, and understand that this post gives some suggestions on techniques, but no silver bullets. |
---|
13 | |
---|
14 | Previously on this channel… |
---|
15 | This is the fifth chapter in a long series. The first couple in the series are particularly important for understanding this point. A (mostly) complete list of the other posts includes: |
---|
16 | |
---|
17 | 1: Tricks With the Floating-Point Format – an overview of the float format |
---|
18 | 2: Stupid Float Tricks – incrementing the integer representation |
---|
19 | 3: Don’t Store That in a Float – a cautionary tale about time |
---|
20 | 3b: They sure look equal… – ranting about Visual Studio’s float failings |
---|
21 | 4: Comparing Floating Point Numbers, 2012 Edition (return *this;) |
---|
22 | 5: Float Precision–From Zero to 100+ Digits – non-obvious answers to how many digits of precision a float has |
---|
23 | 6: C++ 11 std::async for Fast Float Format Finding – running tests on all floats in just a few minutes |
---|
24 | 7: Intermediate Floating-Point Precision – the surprising complexities of how expressions can be evaluated |
---|
25 | 8: Floating-point complexities – some favorite quirks of floating-point math |
---|
26 | 9: Exceptional Floating Point – using floating point exceptions for fun and profit |
---|
27 | 10: That’s Not Normal–the Performance of Odd Floats – the performance implications of infinities, NaNs, and denormals |
---|
28 | 11: Doubles are not floats, so don’t compare them – a common type of float comparison mistake |
---|
29 | 12: Float Precision Revisited: Nine Digit Float Portability – moving floats between gcc and VC++ through text |
---|
30 | 13: Floating-Point Determinism – what does it take to get bit-identical results |
---|
31 | 14: There are Only Four Billion Floats–So Test Them All! – exhaustive testing to avoid embarrassing mistakes |
---|
32 | 15: Please Calculate This Circle’s Circumference – the intersection of C++, const, and floats |
---|
33 | 16: Intel Underestimates Error Bounds by 1.3 quintillion – the headline is not an exaggeration, but it’s not as bad as it sounds |
---|
34 | Comparing for equality |
---|
35 | Floating point math is not exact. Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations or the precision of intermediates can change the result. That means that comparing two floats to see if they are equal is usually not what you want. GCC even has a (well intentioned but misguided) warning for this: “warning: comparing floating point with == or != is unsafe”. |
---|
36 | |
---|
37 | Here’s one example of the inexactness that can creep in: |
---|
38 | |
---|
39 | float f = 0.1f; |
---|
40 | float sum; |
---|
41 | sum = 0; |
---|
42 | |
---|
43 | for (int i = 0; i < 10; ++i) |
---|
44 | sum += f; |
---|
45 | float product = f * 10; |
---|
46 | printf("sum = %1.15f, mul = %1.15f, mul2 = %1.15f\n", |
---|
47 | (xn) − |
---|
48 | f((x)) |
---|
49 | f |
---|
50 | 0 |
---|
51 | (x) |
---|
52 | |
---|
53 | . |
---|
54 | If the initial interval x0 is large enough to contain the root , any subsequent interval |
---|
55 | x is guaranteed to contain too, thanks to a combination of both the containment |
---|
56 | property and the mean-value theorem applied to around (x): |
---|
57 | 0 = () = ((x)) + ( − (x)) · |
---|
58 | 0 |
---|
59 | () with ∈ x. |
---|
60 | 4.2. Error-free transformations |
---|
61 | An immediate consequence of the properties of Section 2.8 about correcting terms |
---|
62 | is that, more often than not, the error is also an FP number that can be computed |
---|
63 | with FP operations, with no need for multiple-precision software. The sequence of |
---|
64 | operations that returns both the result of a floating-point operation and the error of |
---|
65 | that operation is called an error-free transformation (EFT). The first ideas seem to |
---|
66 | go back to Gill (1951), in the context of fixed-point arithmetic. |
---|
67 | Algorithms 1 (page 219) and 2 (below) are explicitly mentioned by Møller(1965). |
---|
68 | Algorithm 1 also appears in the summation algorithm of Kahan (1965), which is |
---|
69 | Algorithm 20 (page 274). Several EFTs ar sum, product, f * 10); |
---|
70 | (xn) − |
---|
71 | f((x)) |
---|
72 | f |
---|
73 | 0 |
---|
74 | (x) |
---|
75 | |
---|
76 | . |
---|
77 | If the initial interval x0 is large enough to contain the root , any subsequent interval |
---|
78 | x is guaranteed to contain too, thanks to a combination of both the containment |
---|
79 | property and the mean-value theorem applied to around (x): |
---|
80 | 0 = () = ((x)) + ( − (x)) · |
---|
81 | 0 |
---|
82 | () with ∈ x. |
---|
83 | 4.2. Error-free transformations |
---|
84 | An immediate consequence of the properties of Section 2.8 about correcting terms |
---|
85 | is that, more often than not, the error is also an FP number that can be computed |
---|
86 | with FP operations, with no need for multiple-precision software. The sequence of |
---|
87 | operations that returns both the result of a floating-point operation and the error of |
---|
88 | that operation is called an error-free transformation (EFT). The first ideas seem to |
---|
89 | go back to Gill (1951), in the context of fixed-point arithmetic. |
---|
90 | Algorithms 1 (page 219) and 2 (below) are explicitly mentioned by Møller(1965). |
---|
91 | Algorithm 1 also appears in the summation algorithm of Kahan (1965), which is |
---|
92 | Algorithm 20 (page 274). Several EFTs arThis code tries to calculate ‘one’ in three different ways: repeated adding, and two slight variants of multiplication. Naturally we get three different results, and only one of them is 1.0: |
---|
93 | (xn) − |
---|
94 | f((x)) |
---|
95 | f |
---|
96 | 0 |
---|
97 | (x) |
---|
98 | |
---|
99 | . |
---|
100 | If the initial interval x0 is large enough to contain the root , any subsequent interval |
---|
101 | x is guaranteed to contain too, thanks to a combination of both the containment |
---|
102 | property and the mean-value theorem applied to around (x): |
---|
103 | 0 = () = ((x)) + ( − (x)) · |
---|
104 | 0 |
---|
105 | () with ∈ x. |
---|
106 | 4.2. Error-free transformations |
---|
107 | An immediate consequence of the properties of Section 2.8 about correcting terms |
---|
108 | is that, more often than not, the error is also an FP number that can be computed |
---|
109 | with FP operations, with no need for multiple-precision software. The sequence of |
---|
110 | operations that returns both the result of a floating-point operation and the error of |
---|
111 | that operation is called an error-free transformation (EFT). The first ideas seem to |
---|
112 | go back to Gill (1951), in the context of fixed-point arithmetic. |
---|
113 | Algorithms 1 (page 219) and 2 (below) are explicitly mentioned by Møller(1965). |
---|
114 | Algorithm 1 also appears in the summation algorithm of Kahan (1965), which is |
---|
115 | Algorithm 20 (page 274). Several EFTs ar |
---|
116 | (xn) − |
---|
117 | f((x)) |
---|
118 | f |
---|
119 | 0 |
---|
120 | (x) |
---|
121 | |
---|
122 | . |
---|
123 | If the initial interval x0 is large enough to contain the root , any subsequent interval |
---|
124 | x is guaranteed to contain too, thanks to a combination of both the containment |
---|
125 | property and the mean-value theorem applied to around (x): |
---|
126 | 0 = () = ((x)) + ( − (x)) · |
---|
127 | 0 |
---|
128 | () with ∈ x. |
---|
129 | 4.2. Error-free transformations |
---|
130 | An immediate consequence of the properties of Section 2.8 about correcting terms |
---|
131 | is that, more often than not, the error is also an FP number that can be computed |
---|
132 | with FP operations, with no need for multiple-precision software. The sequence of |
---|
133 | operations that returns both the result of a floating-point operation and the error of |
---|
134 | that operation is called an error-free transformation (EFT). The first ideas seem to |
---|
135 | go back to Gill (1951), in the context of fixed-point arithmetic. |
---|
136 | Algorithms 1 (page 219) and 2 (below) are explicitly mentioned by Møller(1965). |
---|
137 | Algorithm 1 also appears in the summation algorithm of Kahan (1965), which is |
---|
138 | Algorithm 20 (page 274). Several EFTs ar |
---|
139 | |
---|
140 | |
---|