by Nikolai V. Shokhirev
| ABC tutorials |
Prev: Approaches, Platforms, Languages, Tools
Next: Numerical Methods
"Computer Methods for Mathematical Computations by George Forsythe, Michael Malcolm and Cleve Moler is one of the great classic textbooks of numerical methods for scientists and engineers" (Ralph Carmichael). I also used this book[1-3] as a reference for my numeric projects. I translated the code from this book to Pascal (a part of PasMatLib) and C++ (in preparation). The book inspired me for a tutorial on numerical methods (see also the links below).
The book covers the following subjects:
The above subjects represent a bare minimum of programming tools. Nevertheless these methods allow solution of a variety of computational problems. However, I would extend the list with
Integer numbers are used for counting, indices, enumerations and sets. They are never used for numerical calculations (e.g. complex<int> makes little sense) . There are several integer formats. I use 32-bit integers as universal integer numbers (in particular, to avoid conversion issues). They can store ~109 values.
Floating-point numbers model continuous real numbers in computers. This is rather poor representation of real numbers. In general, all arithmetic operations produce so called round-off errors. The measure of the relative effects of rounding errors is the machine epsilon [1, 4 - 6], macheps. It is defined as the smallest value so that
1.0 + macheps > 1.0
It measures the effects of rounding errors made when adding, subtracting, multiplying, or dividing two numbers. For single precision floating-point numbers macheps ~ 10 -7, which is not enough for extensive numeric computations. For double precision floating-point numbers macheps ~ 10 -16. With properly selected algorithms, this accuracy is sufficient even for extensive (≥106 operations) calculations. Moreover, it should be a warning signal if an algorithm requires extended (long double) precision.
| Type | C++ VS 2005 | g++ MinGW | C++Builder 6 |
| int | 4 | 4 | 4 |
| long | 4 | 4 | 4 |
| long long | 8 | 8 | 8 |
| double | 8 | 8 | 8 |
| long double | 8 | 12 | 10 |
In my PasMatLib I use the following type definition:
type TFloat = double;and in CppMatLib
typedef double real;
Unlike Fortran, many languages (C++, Pascal, Java, C#) lack of built-in vector, matrix and complex types. However they allow creation of objects with the necessary functionality.
We can formulate some requirements for such objects:
Indices. The object should support arbitrary index limits in order to model naturally real objects. Example 1: Angular momentum projection l, -L ≤ l ≤ L . Example 2: Negative indices can label historic prices.
Compatibility with good old Fortran 1-based algorithms is an additional important reason.
Integer arrays. They should be used to store indices, for example, permutations and pivot elements indices. Arithmetic operations (e.g. multiplication) between integer arrays or arrays and scalars have limited use and are not required. The comparison operators (e.g. > or ==) on entire objects do not make much sense as well. However, fast search by value, swap, index shift can be useful. Some specific functionality can be implemented in external functions or derived objects.
Float arrays. For float arrays such quantities as the dot product of vectors or Norm, matrix-vector or tensor products make sense. Some operations can be implemented by overloading arithmetic operators (C++. C#, Delphi.Net). This makes your code more compact and readable, but this approach should be used with caution because creation of temporary objects can be expensive (especially for big matrices) [7].
Comparison operators for real-valued arrays are useless and its overloading is an example of the speculative generality. The parameterless equality operator == should be replaced with a method implementing a distance between objects (for a specified norm) and some tolerance level.
Important note about the dot (scalar) product. It is defined as a sum of the products of vector components:

It was pointed out [4] that the accumulation of products is a possible source of round-off errors and should be performed with higher accuracy. It can be especially important in calculation of matrix rotations and reflections [4]. Usually double precision is sufficient, but one should consider using long double (extended) precision for this purpose.
Complex numbers can be considered as a special case of two-component floating-point vectors. The above considerations are also applicable to complex numbers. In particular, the use of integers and comparison operations is senseless.
| ABC tutorials |
Prev: Approaches, Platforms, Languages, Tools
Next: Numerical Methods
ŠNikolai V. Shokhirev, 2004-2008