Differences

This shows you the differences between two versions of the page.

--- developers:programmer_guide [2014/01/18 15:43] – external edit 127.0.0.1
+++ developers:programmer_guide [2014/01/22 15:52] (current) – Yann Pouillon
@@ Line 1: / Line 1: @@
+===== Programmer's Guide =====
+NOTE: this file has NOT yet been updated for the response function features...
+NOTE: this file has NOT yet been updated for structured datatypes ...
+This is a brief programmer guide describing the structure of the main program (abinit) of the ABINIT package.
+It is intended to provide some introductory guidance to programmers who may want to modify parts of the code.
+You will find the code fairly well commented and should explore it to get more details than provided below.
+The reader is assumed to have already gone through the latest version of the following files:
+  * ~abinit/doc/users/new_user_guide.html
+  * ~abinit/doc/users/abinit_help.html
+  * ~abinit/doc/users/context
+It is important that the reader know how to compile the code, and how to run tests.
+This is described in detail in the installation notes on the Web, that can also be found in the ~abinit/doc/install_notes directory.
+From now on, we assume that you are sufficiently familiarized with these different points, and that you have sufficient experience in the use of ABINIT.
+The ABINIT group rules for coding in Fortran 90 are detailed in the ~abinit/doc/developers/rules_coding file.
+These rules are mostly code-independent.
+Here we describe facts related specifically to the ABINIT package.
+In order to allow programmers to develop different parts of the code at the same time, while avoiding synchronisation problems, a few rules have been sketched.
+See the ~abinit/doc/developers/contributing.html file.
+Specific facts related to parallelism in abinit are explained in ~abinit/doc/developers/rules_paral
+Structure of the present file:
+  (A) A few facts
+  (B) The skeleton of the code.
+  (C) Debugging, timing and statistics facilities.
+  (D) Utility subroutines.
+  (E) Libraries.
+===== (A) A few facts, useful to know =====
+The main routine is called abinit.F90 and is present in the directory ~abinit/src/main.
+The rest of the subroutines called by abinit.F90 are in the other ~abinit/src or ~abinit/lib directories.
+Other main routines (mrgddb.F90, anaddb.F90, chi.F90, sigma.F90 ...) are present in ~abinit/src/main.
+The subroutines are splitted in two parts: those that come from
+other packages, like Blas, Lapack, and other numerical routines or IO routines;
+and those that have been written directly for ABINIT.
+The first ones are found in the ~abinit/lib directories,
+are often written in Fortran77 or C, and do not follow the coding rules of the ABINIT
+project. The second ones are written in Fortran90 (there are two C routines,
+for timing purposes), follow the coding rules, and are found in the
+different ~abinit/src directories. At the time of writing, there are more
+than 300 subroutines, and about 100000 lines of code, including the library routines.
+Machine dependency is accomodated by using the C preprocessor (CPP) on ALL files in ~abinit/src
+so the fortran compilations are conducted by first preprocessing every file then passing the
+result to the compiler. See the ~abinit/doc/install_notes
+directory and the ~abinit/doc/developers/use_cpp file.
+The sequential and parallel versions are also produced by c preprocessing
+a unique source file. The routines that differ in the sequential and
+parallel versions of the code are found in the Src_seqpar
+and Src_basis directory.
+Other Src_* directories contains separately the routines for XC treatment,
+pseudopotential input, parsing of input file, those for the anaddb code,
+and all the remaining (common) sources.
+===== (B) The skeleton of the code =====
+(to be updated for RF features)
+One can distinguish 10 important routines, called levels :
+  (1) abinit
+  (2) driver
+  (3) gstate
+  (4) mover
+  (5) scfcv
+  (6) vtorho
+  (7) vtowfk
+  (8) cgwf
+  (9) getghc
+The routine abinit.F90 calls driver.F90, driver.F90 calls gstate.F90, ...
+==== B.1. abinit.F90 ====
+The main routine, abinit, level 1, has the aim of reading completely
+the input files, and checking whether the input variables
+are sensible, and whether the available memory is sufficient.
+These operations should be very fast, so that the user is
+quickly warned whether his/her input are incorrect.
+No big array is allocated in level 1, except for testing purposes.
+In detail, abinit.F90 performs, or calls routines that perform:
+  - Eventually initialize MPI (for parallel runs)
+  - Initialize overall timing of run
+  - Print greeting for interactive user
+  - Read names of files (input, output, rootinput, rootoutput, roottemporaries),
+  - Create the name of the status file, initialize the status subroutine.
+  - Open output file and print herald at top of output and log files
+  - Read the input file, and store the information in a long string of characters
+  - Take ndtset and ntypat from the input string, then allocate the arrays whose dimensions depends only on ndtset, ntypat and msym.
+  - Finish to read the "files" file completely, and also initialize mproj and mpsang
+  - Continue to analyze the input string, and allocate the arrays needed for input.
+  - Provide defaults for the variables that have not yet been initialized.
+  - Call the main input routine, and finish the input variable initialisation.
+  - Echo input data to output file and log file
+  - Perform additional checks on input data
+At this stage, all the information from the "files" file and "input" file
+have been read and checked.
+  - Perform main calculation  (call gstate)
+  - Give final echo of coordinates, etc.
+  - Timing analysis
+  - Delete the status file, and, for build-in tests, analyse the correctness of results
+  - Write the final timing, close the output file, and write a final line to the log file
+  - Eventual cleaning of MPI run
+==== B.2. driver.F90 ====
+In driver, level 2, a loop on the data sets is present.
+For each data set, either the ground state subroutine (gstate.F90) or
+the response function subroutine (respfn.F90) is called (to be described
+in a future version of this file).
+A few big arrays are allocated at that level.
+==== B.3. gstate.F90 ====
+The routine gstate.F90 , level 3, performs a variety of initialisation tasks
+and result analysis, for which different routines are called.
+A variety of arrays are computed or initialized in subroutine
+setup1.F90 .  Then, based on the geometrical data input and the
+values of k points and ecut, the basis sphere of planewaves is computed
+by kpgio.F90 (that calls kpgsph.F90 and boundy.F90).
+Then header information is written to a file, for
+use in constructing output wavefunction files, see headwr.F90 . These wf files
+contain a description of various input settings which were used to create
+them. Other routines related to the header, and used later,
+are headcopy.F90 , headck.F90 , and headlv.F90 , as well as pspini.F90 and clnup1.F90 .
+Next, all the pseudopotential files needed for the calculation are read.
+Subroutine pspini.F90 controls this part, and calls, for each atom type, the
+routine pspatm.F90 , that will call different routines
+(psp1in.F90, psp2in.F90, psp3in.F90, psp5in.F90, psp6in.F90), according
+to the pseudopotential file format.
+Various transforms of the input psp data are taken
+(bessel function transforms) relevant to the local and nonlocal parts of
+the potential (psp1lo.F90, psp1nl.F90, psp2lo.F90, psp2nl.F90, psp3lo.F90, psp3nl.F90,
+psp5lo.F90, psp5nl.F90) and the non-linear XC core-correction (psp1cc.F90,
+psp4cc.F90, and psp6cc.F90) .
+The wavefunctions are initialized (read or set to random numbers) in inwffil.F90 .
+If they are to be initialized at random,
+or if the simple reading of an existing wf file is needed, then the
+routine initwf.F90 is called by inwffil.F90 . If some work
+has to be done of the existing wavefunctions,
+the operation is more delicate, and inwffil.F90 needs to call the routine
+newkpt.F90, that calls different other routines :
+  - listkk.F90 (to find the closest k point)
+  - kpgsph.F90 (generate list of plane waves)
+  - sphere.F90 (to translate plane wave coefficients from one cut-off sphere to another)
+  - envlop.F90 (multiply by an envelope the random coefficient, to reduce their kinetic energy)
+  - orthon.F90 (to orthonormalize the wavefunctions)
+The symmetries are initialized in setsym.F90,
+occupation numbers might be computed in newocc.F90 if needed.
+Then, the code computes a starting density and screening potential,
+either from the existing wavefunctions (mkrho.F90), or from the characteristics
+of the pseudopotential (initro.F90), or by reading a file (ioarr.F90).
+At this point, the code either pursues a fixed atom calculation (level 6)
+or a moving atom calculation (levels 4 or 5).
+After these calls, when the big calculations are done, gstate.F90 continues
+by printing results, closing files, deallocating arrays, and return the
+control to driver.F90
+==== B.4. mover.F90 ====
+For fixed atoms (ionmov=0), gstate.F90 calls directly scfcv.F90
+(self-consistent field convergence) ;
+For movement of ions (ionmov>0) gstate will call mover.F90
+The routine mover.F90 receives an initial set of atom positions
+and cell parameters and finish providing a new set after (ntime)
+iterations (if convergence is needed and never reached)
+The table of contents of the routine follows:
+. Initialization of indexes and allocations of arrays
+. Particularities of each predictor
+. Set the number of iterations ntime
+. Try to read history of previous calculations
+. Allocate the hist structure
+. First output before any itime or icycle
+. Compute xcart and fill the history of the first SCFCV
+. Loop for itime (From 1 to ntime)
+. Loop for icycle (From 1 to ncycles)
+. Output for each icycle (and itime)
+. Symmetrize atomic coordinates over space group elements
+. => Call to SCFCV routine and fill history with forces
+. Write the history into the _HIST file
+. Output after SCFCV
+. => Test Convergence of forces and stresses
+. => Precondition forces, stress and energy
+. => Call to each predictor
+. Use the history  to extract the new values
+. End loop icycle
+. End loop itime
+. Set the final values of xcart and xred
+. XML Output at the end
+. Deallocate hist and ab_mover datatypes
+The main predictors are described with ionmov value:
+.  Molecular dynamics without viscosity (vis=0)
+.  Molecular dynamics with viscosity (vis/=0)
+.  Broyden-Fletcher-Goldfard-Shanno method (forces)
+.  Broyden-Fletcher-Goldfard-Shanno method (forces,Tot energy)
+.  Conjugate gradient of potential and ionic degrees of freedom
+.  Simple relaxation of ionic positions
+.  Verlet algorithm for molecular dynamics
+.  Verlet algorithm blocking every atom where dot(vel,force)<0
+.  Verlet algorithm with a nose-hoover thermostat
+.  Langevin molecular dynamics
+. BFGS with delocalized internal coordinates
+. Conjugate gradient algorithm
+. Isokinetic ensemble molecular dynamics
+. Isothermal/isenthalpic ensemble molecular dynamics
+. Symplectic algorithm Runge-Kutta-Nyström SRKNa14
+. Ionic positions relaxation using DIIS
+. Steepest descent algorithm
+. Self consistent phonon structure using a supercell
+In all three cases, the routines call many times scfcv.F90, which
+controls update and mixing of the density and potential and generates
+forces for a given arrangement of atoms. With these data, the
+molecular dynamics or the geometry optimization can be performed.
+==== B.5. scfcv.F90 ====
+This routine performs the SCF loop.
+A few arrays, needed for that purpose, are allocated there.
+Inside the loop, scfcv.F90 calls:
+  - setvtr.F90, usually only at the initialisation step, to set a first trial potential
+  - vtorho.F90, level 7, to get the density from the trial potential
+  - vresfo.F90, to get the potential residual, the forces, and components of the energy
+  - newvtr.F90, to precondition the potential residual, and compute the new trial potential
+The computation of Hartree and XC potential is done inside setvtr.F90 and
+newvtr.F90, by calling the routine rhohxc.F90 .
+That routine, in turn, calls hartre.F90 , for the Hartree potential, and
+many different routines for the XC potential, depending on the
+different XC functionals, and the intxc option (xcden.F90, xchelu.F90, xcpbe.F90,
+xcpot.F90, xcpzca.F90, xcspol.F90, xctetr.F90, xcwign.F90, xcxalp.F90)
+After the loop, scfcv.F90 computes the stress, by calling stress.F90 ,
+and also eventually print density, potential, or other files.
+==== B.6. vtorho.F90 ====
+Subroutine vtorho.F90 (potential -v- to density -rho-, level 7)
+produces the density in a fixed potential,
+by summing all contributions of different k points and eventually
+different spins.  Forces are recomputed after each pass in all k points.
+Parallelism is implemented at the level of concurrent
+treatment of each k-point separately, in vtorho.F90 .
+==== B.7. vtowfk.F90 ====
+Subroutine vtowfk (potential -v- to k-point wavefunctions, level 8)
+is called to improve the wavefunctions
+over all bands at a single k point at a time. It gives also
+the contributions of each band to kinetic energy and non-local
+energy. In the case of fixed occupancies, it gives the
+contribution of each k point to the density.
+Subspace diagonalization, and orthogonalisation
+is done within vtowfk, and might be time-consuming.
+==== B.8. cgwf.F90 and getghc.F90 ====
+Subroutine cgwf (Conjugate-gradient on the wavefunctions, level 9),
+runs the iterative optimization of wavefunction
+for a single band and k point, in a fixed potential.
+It start from an existing wavefunction, either in central memory or
+on a temporary file on disk, and refine it, finally writing in central
+memory or on another temporary file on disk.
+Deep within cgwf is a call to getghc (level 10),
+which computes <G|H|C> where |C> is the
+wavefunction.  This subroutine is the guts of the method.  Its time
+is presently dominated by fft calls (about 50-60%), with the next
+bottleneck being the nonlocal operator (20-35%).
+You will find the code fairly well commented and may explore it further
+to get more details than provided above.
+===== C. Debugging, timing and statistics facilities =====
+The abinit code has been equipped with a set of tools for the developers.
+These include:
+) The log file.
+) The prtvol input variable.
+) The status file.
+) The memory subroutine.
+) The time analysis backbone.
+) The statistics provided in the make.
+As mentioned in the new_user_guide or in the abinit_help,
+there are two general files for output: the "output" file and
+the "log" file. When something goes wrong in the code, without
+causing the code to crash, the log file will mention the name
+of the routine where something went wrong, and what went wrong.
+Usually, corrective actions are suggested.
+The output of messages is handled thanks to wrtout.F90 , called when
+the message has been packed in a character string (usually called 'message').
+When something has gone wrong, the exit is to be done by a call to the
+leave_new.F90 subroutine.
+The use of the prtvol input variable in conjunction with
+the log file is the most important tool for debugging.
+As indicated by its name, prtvol controls the print volume in the
+output file and in the log file.
+When equal to 0, the information in the log file is kept at the minimum.
+When equal to 1, the information is already much more complete.
+Even much more flexibility is gained when prtvol is used with negative values.
+These negative values each refer to one the levels of the code
+(i.e. prtvol=-10 refer to debugging of getghc, prtvol=-7 to debugging
+of vtorho). When debugging some level through the use of the corresponding
+negative prtvol value, the amount of data written on the log file,
+coming from this level of the code, will increase dramatically. Moreover,
+after the first execution of the complete level, the code will automatically
+stop (except for the levels 1 and 3, that stop BEFORE entering
+the next level).
+The status file is another important tool for debugging, especially
+because of the UNIX pecularity (when running from a script)
+that the outputs are not immediately
+written in a file, but kept in a buffer, unless this file is closed.
+When the code crashes (for example with a message "segmentation fault"),
+it is difficult to know at which place the segmentation fault happened,
+from the log file.
+The status file is a very short file that, depending on the
+value of the input parameter istatr, can be opened, rewound, written,
+and closed very frequently. This is done by calls to the status.F90
+subroutine. Due to its frequent closing,
+it can indicates precisely where a crash just happened.
+( Note : if the status file is situated on a disk that is "local" to
+the cpu where the job is run,
+the whole operation is usually less than 0.2 msec. On a remote disk
+(NFS), the operation is 10 times more consuming. These data may differ
+from machine to machine : on a Cray-T3E the I/O operations are
+relatively slow. In order not to cause troubles, in the default mode
+the value of istatr is relatively large, causing the file not to be
+often updated. On some machines and depending on the disk access,
+using a small value of istatr -under 5- will cause the code to crash)
+The memory.F90 subroutine is a place where the memory space needed for the
+code is estimated, shortly after reading the input.
+The subroutine will immediately try to allocate as much as memory space
+as estimated, and send an error message if not possible
+(on the P6, this operation makes the job crash in case there is not enough
+memory, but it is not difficult to understand what is the problem, thanks
+to the status file). This memory estimation is hand-coded (this is very boring !).
+The precise description of the allocation in the most critical
+subroutines can constitute a help for the optimization of memory usage.
+Another help for the optimisation is provided by the time analysis backbone.
+Many important routines are timed internally, thanks to two
+calls to the timer routine timepw.F90 (one call at the entrance,
+one call at the exit). A final call to the
+routine timanalys.F90 provide a detailed analysis of the repartition of the
+CPU and Wall clock time, in the critical subroutines, or in the
+different levels of the code. Thanks to this tool, it is rather
+obvious what parts of the code should be optimized, and also
+what is going wrong when the code is ported to a new machine.
+The last feature useful for developing the code is provided by the
+statistics of the make command, in the directory ~abinit.
+This allows to make sure that no file is getting too big to be
+easily manipulated, and indicate when a file is to be splitted
+(see ~abinit/doc/developers/rules_coding).
+Of course, it is important that adequate care is taken to implement
+these features in newly developed parts of the code.
+It is thus expected that the developer read the
+subroutines wrtout.F90, leave_new.F90, timepw.F90, and eventually
+timanalys.F90, status.F90, and memory.F90. It is advised also to read
+the subroutine getghc.F90 (the latter, for an idea of the usage of
+the prtvol=-level debugging option, there for prtvol=-10).
+===== D. Utility subroutines =====
+Beyond the big main routines presented in section B, and the different
+routines for timing, debugging and statistics of section C, other
+routines in ~abinit/src may be worth to learn about...
+There is a whole set of routines for the treatment of strings
+of characters (they should be described shortly in a next version):
+appdig.F90, fappnd.F90, inarray.F90, incomprs.F90, inread.F90, inreplsp.F90,
+inupper.F90, subchr.F90
+Routines for numerical derivation and/or integration :
+ctrap.F90, der_int.F90
+Some Numerical functions
+  - besjm.F90 : half-integer bessel functions
+  - derfc.F90 : complementary error function
+  - invcb.F90 : fast computation of a series of inverse cubic roots
+  - sincos.F90 : fast computation of a series of sine and cosine
+Routines related to symmetries and brillouin zone (should be described) :
+chkgrp.F90, chkibz.F90, cnstti.F90, fixsym.F90, irrzg.F90, setsym.F90, strsym.F90,
+sygrad.F90, symatm.F90, symchk.F90, symdet.F90, symg.F90, symrhg.F90, symzat.F90
+Vector operations :
+  - norm.F90 : normalize a vector
+  - normev.F90 : normalize a set of vectors, and fix the phases
+  - fxphas.F90 : fix the phase of a vector
+  - orthon.F90 : orthonormalize a set of vectors
+  - projbd.F90 : orthogonalize one vector to a set of other vectors
+  - sdirot.F90 : rotate a set of vectors by a unitary transformation
+x3 matrix inversion (integer and real) :
+mati3inv.F90, matr3inv.F90
+Other (should be described) :
+clsopn.F90, fxphas.F90, hermit.F90, iseq.F90, isfile.F90, mkkin.F90,
+mkrdim.F90, prmat.F90, randac.F90, xredxcart.F90
+===== E. Libraries =====
+As for the utility subroutines, the developer should be aware of the
+routines available from the libraries, and use them instead of
+coding something with the same purpose.
+The Lapack library contains a matrix diagonalizer, zhpev.F90, that
+is needed many times in the code. Presently, this is the only
+entry point in the Lapack library. The Blas routines are used
+by Lapack, but are not directly called by ABINIT.
+Only the specific subset of Lapack and Blas, needed to support
+zhpev.F90, is present in ABINIT.
+The Numerical Recipes library contains:
+  - sorting routines (insort.F90, isort2.F90, sort2.F90)
+  - a routine that computes the julian day number (julday.F90)
+  - a function that returns a uniform random deviate between 0.0 and 1.0 (ran1.F90)
+  - spline fitting routines (splfit.F90 and spline.F90)
+  - (to be updated)