by Nikolai Shokhirev
IT Tutorials | ABC Tutorials | Home
All numerical experiments are performed on PC with Intel processor using single and double precision.
| Type | Range | Significant digits | Size in bytes |
| Single | 1.5 10-45 .. 3.4 1038 | 7 - 8 | 4 |
| Double | 5.0 10-324 .. 1.7 10308 | 15 - 16 | 8 |
Test projects for Borland's C++Builder 6 and Delphi 7 are available for download.
The solution by summation is
| Pascal | C++ |
var i: integer;
sum: double;
sum = 0.0;
for i := 1 to 100 do
begin
sum := sum + 0.1;
if sum = 1.0 then
break;
end;
writeln(sum);
|
double sum = 0.0;
for(int i=1; i<=100; i++)
{
sum += 0.1;
if (sum == 1.0)
{
break;
}
}
cout << sum << endl;
|
The result is sum = 9.99999999999998. On the other hand, (10.0*(0.1) = 1.0 ) is true. However you should never rely on the equality of floating-point numbers.
The obvious solution is
| Pascal | C++ |
eps := 1.0;
while (1.0 + eps) > 1.0 do
begin
writeln(eps);
eps := eps/2.0;
end;
|
eps = 1.0;
while ((1.0+eps)> 1.0)
{
cout << eps << endl;
eps /= 2.0;
}
|
The last printed value must be the machine epsilon. However, for Intel processor regardless of the precision of eps it gives 1.08420217248550E-0019 ( see the console projects macheps). This is because this small piece of code was optimized and the internal processor precision wass used. The following code
| Pascal | C++ |
eps := 1.0;
repeat
writeln(eps);
eps := eps/2.0;
sum := 1.0 + eps;
until sum <= 1.0 ;
|
eps = 1.0;
do {
cout << eps << endl;
eps /= 2.0;
sum = 1.0 + eps;
} while (sum > 1.0);
|
gives ε 0 = 1.19209289550781E-7 for single/float and ε 0 = 2.22044604925031E-16 for double.
It is proven that the series
|
(3) |
diverges (tends to infinity as ln n). Let us check this with a computer. The summation is terminates when the sum stops changing:
| Pascal | C++ |
sum := 0.0;
n := 1.0;
repeat
sum1 := sum;
sum := sum + 1.0/n;
n := n + 1.0;
until sum1 = sum;
|
sum = 1.0;
n = 1.0;
do {
sum1 = sum;
sum = sum + 1.0/n;
n += 1.0;
} while (sum > sum1);
|
For the single/float numbers the summation stopped at n = 2097153 and S = 15.4036827087402. The result is much less than the infinity because 1/n < sum*ε 0 . How lonf will it take to run with the double precision?
It is proven that the series
|
(4) |
converges for any finite x. In this experiment the summation was terminated when a term became less than the threshold.
| Precision | x | Threshold ε | Sum | exp(x) | Valid digits |
| Single | -9.5 | 1.0e-10 | 2.13608618651051E-5 | 7.48518298877006E-5 | 0 |
| Double | -9.5 | 1.0e-15 | 7.48518299667056E-5 | 7.48518298877006E-5 | 9 |
| Double | -19.5 | 1.0e-15 | 5.54447786514606E-9 | 3.39826781949507E-9 | 0 |
We can see the single precision result for x = -9.5 has no correct digits. Switching to the double precision only moves the problem to larger values.
For this problem there is much better cure. Let us use the identity
| exp(-x) = 1/exp(x) | (5) |
The results have all possible correct digits:
| Precision | x | Threshold ε | 1/Sum | exp(-x) | Valid digits |
| Single | 9.5 | 1.0e-10 | 7.48518368560357E-5 | 7.48518298877006E-5 | 7 |
| Double | 19.5 | 1.0E-15 | 3.39826781949507E-9 | 3.39826781949507E-9 | 15 |
This trick is discussed in the section "Error propagation".
To be continued ...
| (6) |
IT Tutorials | ABC Tutorials | Home