C++ TutorialBasic input/output in c++C++ AlignmentC++ Argument Dependent Name LookupC++ Arithmitic MetaprogrammingC++ ArraysC++ Atomic TypesC++ AttributesC++ autoC++ Basic Type KeywordsC++ Bit fieldsC++ Bit ManipulationC++ Bit OperatorsC++ Build SystemsC++ C incompatibilitiesC++ C++11 Memory ModelC++ Callable ObjectsC++ Classes/StructuresC++ Client server examplesC++ Common compile/linker errors (GCC)C++ Compiling and BuildingC++ Concurrency with OpenMPC++ Const CorrectnessC++ const keywordC++ Constant class member functionsC++ constexprC++ ContainersC++ Copy ElisionC++ Copying vs AssignmentC++ Curiously Recurring Template Pattern (CRTP)C++ Date and time using chrono headerC++ Debugging and Debug-prevention Tools & TechniquesC++ decltypeC++ Digit separatorsC++ EnumerationC++ ExceptionsC++ Explicit type conversionsC++ Expression templatesC++ File I/OC++ Floating Point ArithmeticC++ Flow ControlC++ Fold ExpressionsC++ Friend keywordC++ function call by value vs. call by referenceC++ Function OverloadingC++ Function Template OverloadingC++ Futures and PromisesC++ Header FilesC++ Implementation-defined behaviorC++ Inline functionsC++ Inline variablesC++ IterationC++ IteratorsC++ KeywordsC++ LambdasC++ Layout of object typesC++ Linkage specificationsC++ LiteralsC++ LoopsC++ Memory managementC++ MetaprogrammingC++ Move SemanticsC++ mutable keywordC++ MutexesC++ NamespacesC++ Non-Static Member FunctionsC++ One Definition Rule (ODR)C++ Operator OverloadingC++ operator precedenceC++ OptimizationC++ Overload resolutionC++ Parameter packsC++ Perfect ForwardingC++ Pimpl IdiomC++ PointersC++ Pointers to membersC++ PolymorphismC++ PreprocessorC++ ProfilingC++ RAII: Resource Acquisition Is InitializationC++ Random number generationC++ Recursive MutexC++ Refactoring TechniquesC++ ReferencesC++ Regular expressionsC++ Resource ManagementC++ Return Type CovarianceC++ Returning several values from a functionC++ RTTI: Run-Time Type InformationC++ Scopes

C++ Floating Point Arithmetic

From WikiOD

Floating Point Numbers are Weird[edit | edit source]

The first mistake that nearly every single programmer makes is presuming that this code will work as intended:

float total = 0;
for(float a = 0; a != 2; a += 0.01f) {
    total += a;

The novice programmer assumes that this will sum up every single number in the range 0, 0.01, 0.02, 0.03, ..., 1.97, 1.98, 1.99, to yield the result 199—the mathematically correct answer.

Two things happen that make this untrue:

  1. The program as written never concludes. a never becomes equal to 2, and the loop never terminates.
  2. If we rewrite the loop logic to check a < 2 instead, the loop terminates, but the total ends up being something different from 199. On IEEE754-compliant machines, it will often sum up to about 201 instead.

The reason that this happens is that Floating Point Numbers represent Approximations of their assigned values.

The classical example is the following computation:

double a = 0.1;
double b = 0.2;
double c = 0.3;
if(a + b == c)
    //This never prints on IEEE754-compliant machines
    std::cout << "This Computer is Magic!" << std::endl; 
    std::cout << "This Computer is pretty normal, all things considered." << std::endl;

Though what we the programmer see is three numbers written in base10, what the compiler (and the underlying hardware) see are binary numbers. Because 0.1, 0.2, and 0.3 require perfect division by 10—which is quite easy in a base-10 system, but impossible in a base-2 system—these numbers have to be stored in imprecise formats, similar to how the number 1/3 has to be stored in the imprecise form 0.333333333333333... in base-10.

//64*bit floats have 53 digits of precision, including the whole-number-part.
double a =     0011111110111001100110011001100110011001100110011001100110011010; //imperfect representation of 0.1
double b =     0011111111001001100110011001100110011001100110011001100110011010; //imperfect representation of 0.2
double c =     0011111111010011001100110011001100110011001100110011001100110011; //imperfect representation of 0.3
double a + b = 0011111111010011001100110011001100110011001100110011001100110100; //Note that this is not quite equal to the "canonical" 0.3!