Practical numerical methods

Floating-point arithmetic

by Nikolai Shokhirev

IT Tutorials  | ABC Tutorials  |   Home

 

Introduction

If we have the calculus, why do we need in addition numerical methods for computer calculations? A common answer is "computer calculations are inaccurate". This is not quite correct. These are two completely different types of calculations.

 

Floating-point computations

The calculus and higher analysis operate with the infinite set of objects called real numbers. The arithmetic operations with real numbers are governed by the following axioms:

  1. Closure Axiom. For real numbers a and b, (a op b) is a unique real number (op is any arithmetic operation: +, -, *, /; b ≠ 0 for / ).
  2. Commutative Axiom. For real numbers a and b, a op b = b op a , (op = *, + ).
  3. Associative Axiom. For real numbers a, b and c, ( a op b ) op c = a op ( b op c ).
  4. Identity Axiom of Addition. For any real number a, a + 0 = 0 + a = a.
  5. Identity Axiom of Multiplication. For any real number a, a*1 = 1*a = a.
  6. Additive Inverse Axiom. For any real number a, there exists a unique real number -a such that a + (-a) = -a + a = 0. The number -a is known as the additive inverse (negative) of a.
  7. Multiplicative Inverse Axiom. For any nonzero real number a, there exists a unique real number (1/a) such that a*(1/a) = (1/a)*a = 1.
  8. Distributive Axiom. For any real numbers a, b and c, ( a op1 b) op2 c = a op2 c op1 b op2 c ( op1 = +, - , op2 = *, / ).

Computers operate with floating-point numbers. Each floating-point number x has the value

          (1)

Usually the numbers are normalized, i.e. d1 > 0. The operations with such numbers violate almost all of the above axioms. This is not at all the arithmetic we know. Let us illustrate this with a simplified example.

 

Example of a floating-point set

The set of numbers (1) with the base β = 2, the precision t = 2 and the exponent range L = -1, U = 2 can be presented in the following graphical form:

 

The set of floating-point numbers

You can visually check the correctness of the following general statements.

First, this set is not a continuum, or even an infinite set. The number are not equidistant. 

Obviously the range of numbers is limited (to [-3,3] in our example). 

For any positive real  number always exists a smaller positive number. This property of real numbers is of a fundamental significance in the higher analysis. This is not true for floating-point number:  there is always a finite gap between zero and the smallest non-zero number (±1/4 in our example). 

 

Overflow and round-off errors

The result of the arithmetic operations does not necessarily belong to the set of floating-point numbers (e.g. 2 + 1/4, 1/3 ). The necessity to map the result to some floating-point numbers causes so-called round-off errors. However, it is impossible to map to any number if the result is outside the limits of the set. This is called the overflow error. 

The order of operations matters. For example (2 + 3/2) - 1 causes the overflow, but (2 -1) + 3/2 gives only a round-off error. 

The numbers less than the smallest non-zero values a rounded to zero. This can cause a catastrophic loss of accuracy. This effect is called the underflow. 

 

Machine epsilon

The round-off errors are machine-dependent. In more general way, the accuracy of floating-point arithmetic can be characterized by machine epsilon, the smallest number ε 0 such that

1 + ε 0 > 1        (2)

Many numerical algorithms use the value of machine epsilon for the optimization of accuracy.

 

Discretization and truncation errors

In the calculus and higher analysis a solution is often presented as a result of some infinite process (series, succession, limit). Infinite processes cannot be implemented on computes because of a finite speed of calculations and accumulation of the round-off errors. Therefore, infinitely small objects have to be replaced with finite elements and/or processes must be terminated at some point. All this is the source of errors as well.

 

 Numerical instability

As we see, there are various sources of errors. Once errors are generated, they propagate through calculations. Some algorithms can amplify the errors which causes numerical instability. 

 

IT Tutorials  | ABC Tutorials  |   Home


Warning: include(../../../footer.php) [function.include]: failed to open stream: No such file or directory in /usr111/home/s/i/siberia/public_html/u_az/public_html/IT/num/num1.php on line 192

Warning: include() [function.include]: Failed opening '../../../footer.php' for inclusion (include_path='.:/usr/locl/lib/php') in /usr111/home/s/i/siberia/public_html/u_az/public_html/IT/num/num1.php on line 192