User Tools

Site Tools


developers:programmer_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

developers:programmer_guide [2014/01/18 15:43] – external edit 127.0.0.1developers:programmer_guide [2014/01/22 15:52] (current) Yann Pouillon
Line 1: Line 1:
 +===== Programmer's Guide =====
  
 +NOTE: this file has NOT yet been updated for the response function features...
 +
 +NOTE: this file has NOT yet been updated for structured datatypes ...
 +
 +This is a brief programmer guide describing the structure of the main program (abinit) of the ABINIT package.
 +It is intended to provide some introductory guidance to programmers who may want to modify parts of the code.
 +You will find the code fairly well commented and should explore it to get more details than provided below.
 +
 +The reader is assumed to have already gone through the latest version of the following files:
 +
 +  * ~abinit/doc/users/new_user_guide.html
 +  * ~abinit/doc/users/abinit_help.html
 +  * ~abinit/doc/users/context
 +
 +It is important that the reader know how to compile the code, and how to run tests. 
 +This is described in detail in the installation notes on the Web, that can also be found in the ~abinit/doc/install_notes directory.
 +From now on, we assume that you are sufficiently familiarized with these different points, and that you have sufficient experience in the use of ABINIT.
 +
 +The ABINIT group rules for coding in Fortran 90 are detailed in the ~abinit/doc/developers/rules_coding file.
 +These rules are mostly code-independent.
 +Here we describe facts related specifically to the ABINIT package.
 +
 +In order to allow programmers to develop different parts of the code at the same time, while avoiding synchronisation problems, a few rules have been sketched. 
 +See the ~abinit/doc/developers/contributing.html file.
 +
 +Specific facts related to parallelism in abinit are explained in ~abinit/doc/developers/rules_paral
 +
 +Structure of the present file:
 +
 +  (A) A few facts
 +  (B) The skeleton of the code.
 +  (C) Debugging, timing and statistics facilities.
 +  (D) Utility subroutines.
 +  (E) Libraries.
 +
 +===== (A) A few facts, useful to know =====
 +
 +The main routine is called abinit.F90 and is present in the directory ~abinit/src/main.
 +The rest of the subroutines called by abinit.F90 are in the other ~abinit/src or ~abinit/lib directories.
 +Other main routines (mrgddb.F90, anaddb.F90, chi.F90, sigma.F90 ...) are present in ~abinit/src/main.
 +
 +The subroutines are splitted in two parts: those that come from
 +other packages, like Blas, Lapack, and other numerical routines or IO routines; 
 +and those that have been written directly for ABINIT. 
 +The first ones are found in the ~abinit/lib directories,
 +are often written in Fortran77 or C, and do not follow the coding rules of the ABINIT
 +project. The second ones are written in Fortran90 (there are two C routines,
 +for timing purposes), follow the coding rules, and are found in the
 +different ~abinit/src directories. At the time of writing, there are more
 +than 300 subroutines, and about 100000 lines of code, including the library routines.
 +
 +Machine dependency is accomodated by using the C preprocessor (CPP) on ALL files in ~abinit/src 
 +so the fortran compilations are conducted by first preprocessing every file then passing the
 +result to the compiler. See the ~abinit/doc/install_notes
 +directory and the ~abinit/doc/developers/use_cpp file.
 +The sequential and parallel versions are also produced by c preprocessing
 +a unique source file. The routines that differ in the sequential and
 +parallel versions of the code are found in the Src_seqpar
 +and Src_basis directory.
 +Other Src_* directories contains separately the routines for XC treatment,
 +pseudopotential input, parsing of input file, those for the anaddb code,
 +and all the remaining (common) sources.
 +
 +===== (B) The skeleton of the code ===== 
 +
 +(to be updated for RF features)
 +One can distinguish 10 important routines, called levels :
 +
 +  (1) abinit
 +  (2) driver
 +  (3) gstate
 +  (4) mover
 +  (5) scfcv
 +  (6) vtorho
 +  (7) vtowfk
 +  (8) cgwf
 +  (9) getghc
 +
 +The routine abinit.F90 calls driver.F90, driver.F90 calls gstate.F90, ...
 +
 +
 +==== B.1. abinit.F90 ====
 +
 +The main routine, abinit, level 1, has the aim of reading completely
 +the input files, and checking whether the input variables
 +are sensible, and whether the available memory is sufficient.
 +These operations should be very fast, so that the user is
 +quickly warned whether his/her input are incorrect.
 +No big array is allocated in level 1, except for testing purposes.
 +In detail, abinit.F90 performs, or calls routines that perform:
 +
 +  - Eventually initialize MPI (for parallel runs)
 +  - Initialize overall timing of run
 +  - Print greeting for interactive user
 +  - Read names of files (input, output, rootinput, rootoutput, roottemporaries),
 +  - Create the name of the status file, initialize the status subroutine.
 +  - Open output file and print herald at top of output and log files
 +  - Read the input file, and store the information in a long string of characters
 +  - Take ndtset and ntypat from the input string, then allocate the arrays whose dimensions depends only on ndtset, ntypat and msym.
 +  - Finish to read the "files" file completely, and also initialize mproj and mpsang
 +  - Continue to analyze the input string, and allocate the arrays needed for input.
 +  - Provide defaults for the variables that have not yet been initialized.
 +  - Call the main input routine, and finish the input variable initialisation.
 +  - Echo input data to output file and log file
 +  - Perform additional checks on input data
 +
 +At this stage, all the information from the "files" file and "input" file
 +have been read and checked.
 +
 +  - Perform main calculation  (call gstate)
 +  - Give final echo of coordinates, etc.
 +  - Timing analysis
 +  - Delete the status file, and, for build-in tests, analyse the correctness of results
 +  - Write the final timing, close the output file, and write a final line to the log file
 +  - Eventual cleaning of MPI run
 +
 +
 +==== B.2. driver.F90 ====
 +
 +In driver, level 2, a loop on the data sets is present. 
 +For each data set, either the ground state subroutine (gstate.F90) or
 +the response function subroutine (respfn.F90) is called (to be described
 +in a future version of this file).
 +A few big arrays are allocated at that level.
 +
 +
 +==== B.3. gstate.F90 ====
 +
 +The routine gstate.F90 , level 3, performs a variety of initialisation tasks
 +and result analysis, for which different routines are called.
 +
 +A variety of arrays are computed or initialized in subroutine
 +setup1.F90 .  Then, based on the geometrical data input and the
 +values of k points and ecut, the basis sphere of planewaves is computed
 +by kpgio.F90 (that calls kpgsph.F90 and boundy.F90).
 +Then header information is written to a file, for
 +use in constructing output wavefunction files, see headwr.F90 . These wf files
 +contain a description of various input settings which were used to create
 +them. Other routines related to the header, and used later,
 +are headcopy.F90 , headck.F90 , and headlv.F90 , as well as pspini.F90 and clnup1.F90 .
 +
 +Next, all the pseudopotential files needed for the calculation are read.
 +Subroutine pspini.F90 controls this part, and calls, for each atom type, the
 +routine pspatm.F90 , that will call different routines
 +(psp1in.F90, psp2in.F90, psp3in.F90, psp5in.F90, psp6in.F90), according
 +to the pseudopotential file format.
 +Various transforms of the input psp data are taken
 +(bessel function transforms) relevant to the local and nonlocal parts of
 +the potential (psp1lo.F90, psp1nl.F90, psp2lo.F90, psp2nl.F90, psp3lo.F90, psp3nl.F90,
 +psp5lo.F90, psp5nl.F90) and the non-linear XC core-correction (psp1cc.F90,
 +psp4cc.F90, and psp6cc.F90) .
 +
 +The wavefunctions are initialized (read or set to random numbers) in inwffil.F90 .
 +If they are to be initialized at random,
 +or if the simple reading of an existing wf file is needed, then the
 +routine initwf.F90 is called by inwffil.F90 . If some work
 +has to be done of the existing wavefunctions,
 +the operation is more delicate, and inwffil.F90 needs to call the routine
 +newkpt.F90, that calls different other routines :
 +
 +  - listkk.F90 (to find the closest k point)
 +  - kpgsph.F90 (generate list of plane waves)
 +  - sphere.F90 (to translate plane wave coefficients from one cut-off sphere to another)
 +  - envlop.F90 (multiply by an envelope the random coefficient, to reduce their kinetic energy)
 +  - orthon.F90 (to orthonormalize the wavefunctions)
 +
 +The symmetries are initialized in setsym.F90,
 +occupation numbers might be computed in newocc.F90 if needed.
 +Then, the code computes a starting density and screening potential,
 +either from the existing wavefunctions (mkrho.F90), or from the characteristics
 +of the pseudopotential (initro.F90), or by reading a file (ioarr.F90).
 +At this point, the code either pursues a fixed atom calculation (level 6)
 +or a moving atom calculation (levels 4 or 5).
 +
 +After these calls, when the big calculations are done, gstate.F90 continues
 +by printing results, closing files, deallocating arrays, and return the
 +control to driver.F90
 +
 +==== B.4. mover.F90 ====
 +
 +For fixed atoms (ionmov=0), gstate.F90 calls directly scfcv.F90
 +(self-consistent field convergence) ;
 +For movement of ions (ionmov>0) gstate will call mover.F90
 +The routine mover.F90 receives an initial set of atom positions
 +and cell parameters and finish providing a new set after (ntime)
 +iterations (if convergence is needed and never reached)
 +The table of contents of the routine follows:
 +
 +  01. Initialization of indexes and allocations of arrays
 +  02. Particularities of each predictor
 +  03. Set the number of iterations ntime
 +  04. Try to read history of previous calculations
 +  05. Allocate the hist structure
 +  06. First output before any itime or icycle
 +  07. Compute xcart and fill the history of the first SCFCV
 +  08. Loop for itime (From 1 to ntime)
 +  09. Loop for icycle (From 1 to ncycles)
 +  10. Output for each icycle (and itime)
 +  11. Symmetrize atomic coordinates over space group elements
 +  12. => Call to SCFCV routine and fill history with forces
 +  13. Write the history into the _HIST file
 +  14. Output after SCFCV
 +  15. => Test Convergence of forces and stresses
 +  16. => Precondition forces, stress and energy
 +  17. => Call to each predictor
 +  18. Use the history  to extract the new values
 +  19. End loop icycle
 +  20. End loop itime
 +  21. Set the final values of xcart and xred
 +  22. XML Output at the end
 +  23. Deallocate hist and ab_mover datatypes
 +
 +The main predictors are described with ionmov value:
 +
 +  1.  Molecular dynamics without viscosity (vis=0)
 +  1.  Molecular dynamics with viscosity (vis/=0)
 +  2.  Broyden-Fletcher-Goldfard-Shanno method (forces)
 +  3.  Broyden-Fletcher-Goldfard-Shanno method (forces,Tot energy)
 +  4.  Conjugate gradient of potential and ionic degrees of freedom
 +  5.  Simple relaxation of ionic positions
 +  6.  Verlet algorithm for molecular dynamics
 +  7.  Verlet algorithm blocking every atom where dot(vel,force)<0
 +  8.  Verlet algorithm with a nose-hoover thermostat
 +  9.  Langevin molecular dynamics
 +  10. BFGS with delocalized internal coordinates
 +  11. Conjugate gradient algorithm
 +  12. Isokinetic ensemble molecular dynamics
 +  13. Isothermal/isenthalpic ensemble molecular dynamics
 +  14. Symplectic algorithm Runge-Kutta-Nyström SRKNa14
 +  20. Ionic positions relaxation using DIIS
 +  21. Steepest descent algorithm
 +  30. Self consistent phonon structure using a supercell
 +
 +In all three cases, the routines call many times scfcv.F90, which
 +controls update and mixing of the density and potential and generates
 +forces for a given arrangement of atoms. With these data, the
 +molecular dynamics or the geometry optimization can be performed.
 +
 +==== B.5. scfcv.F90 ====
 +
 +This routine performs the SCF loop. 
 +A few arrays, needed for that purpose, are allocated there.
 +Inside the loop, scfcv.F90 calls:
 +
 +  - setvtr.F90, usually only at the initialisation step, to set a first trial potential
 +  - vtorho.F90, level 7, to get the density from the trial potential
 +  - vresfo.F90, to get the potential residual, the forces, and components of the energy
 +  - newvtr.F90, to precondition the potential residual, and compute the new trial potential
 +
 +The computation of Hartree and XC potential is done inside setvtr.F90 and
 +newvtr.F90, by calling the routine rhohxc.F90 .
 +That routine, in turn, calls hartre.F90 , for the Hartree potential, and
 +many different routines for the XC potential, depending on the
 +different XC functionals, and the intxc option (xcden.F90, xchelu.F90, xcpbe.F90,
 +xcpot.F90, xcpzca.F90, xcspol.F90, xctetr.F90, xcwign.F90, xcxalp.F90)
 +
 +After the loop, scfcv.F90 computes the stress, by calling stress.F90 ,
 +and also eventually print density, potential, or other files.
 +
 +==== B.6. vtorho.F90 ====
 +
 +Subroutine vtorho.F90 (potential -v- to density -rho-, level 7)
 +produces the density in a fixed potential,
 +by summing all contributions of different k points and eventually
 +different spins.  Forces are recomputed after each pass in all k points.
 +Parallelism is implemented at the level of concurrent
 +treatment of each k-point separately, in vtorho.F90 .
 +
 +==== B.7. vtowfk.F90 ====
 +
 +Subroutine vtowfk (potential -v- to k-point wavefunctions, level 8)
 +is called to improve the wavefunctions
 +over all bands at a single k point at a time. It gives also
 +the contributions of each band to kinetic energy and non-local
 +energy. In the case of fixed occupancies, it gives the
 +contribution of each k point to the density.
 +Subspace diagonalization, and orthogonalisation
 +is done within vtowfk, and might be time-consuming.
 +
 +==== B.8. cgwf.F90 and getghc.F90 ====
 +
 +Subroutine cgwf (Conjugate-gradient on the wavefunctions, level 9),
 +runs the iterative optimization of wavefunction
 +for a single band and k point, in a fixed potential.
 +It start from an existing wavefunction, either in central memory or
 +on a temporary file on disk, and refine it, finally writing in central
 +memory or on another temporary file on disk.
 +Deep within cgwf is a call to getghc (level 10),
 +which computes <G|H|C> where |C> is the
 +wavefunction.  This subroutine is the guts of the method.  Its time
 +is presently dominated by fft calls (about 50-60%), with the next
 +bottleneck being the nonlocal operator (20-35%).
 +
 +You will find the code fairly well commented and may explore it further
 +to get more details than provided above.
 +
 +===== C. Debugging, timing and statistics facilities =====
 +
 +The abinit code has been equipped with a set of tools for the developers. 
 +These include:
 +
 +  1) The log file.
 +  2) The prtvol input variable.
 +  3) The status file.
 +  4) The memory subroutine.
 +  5) The time analysis backbone.
 +  6) The statistics provided in the make.
 +
 +As mentioned in the new_user_guide or in the abinit_help,
 +there are two general files for output: the "output" file and
 +the "log" file. When something goes wrong in the code, without
 +causing the code to crash, the log file will mention the name
 +of the routine where something went wrong, and what went wrong.
 +Usually, corrective actions are suggested.
 +The output of messages is handled thanks to wrtout.F90 , called when
 +the message has been packed in a character string (usually called 'message').
 +When something has gone wrong, the exit is to be done by a call to the
 +leave_new.F90 subroutine.
 +
 +The use of the prtvol input variable in conjunction with
 +the log file is the most important tool for debugging.
 +As indicated by its name, prtvol controls the print volume in the
 +output file and in the log file.
 +When equal to 0, the information in the log file is kept at the minimum.
 +When equal to 1, the information is already much more complete.
 +Even much more flexibility is gained when prtvol is used with negative values.
 +These negative values each refer to one the levels of the code
 +(i.e. prtvol=-10 refer to debugging of getghc, prtvol=-7 to debugging
 +of vtorho). When debugging some level through the use of the corresponding
 +negative prtvol value, the amount of data written on the log file,
 +coming from this level of the code, will increase dramatically. Moreover,
 +after the first execution of the complete level, the code will automatically
 +stop (except for the levels 1 and 3, that stop BEFORE entering
 +the next level).
 +
 +The status file is another important tool for debugging, especially
 +because of the UNIX pecularity (when running from a script)
 +that the outputs are not immediately
 +written in a file, but kept in a buffer, unless this file is closed.
 +When the code crashes (for example with a message "segmentation fault"),
 +it is difficult to know at which place the segmentation fault happened,
 +from the log file.
 +The status file is a very short file that, depending on the
 +value of the input parameter istatr, can be opened, rewound, written,
 +and closed very frequently. This is done by calls to the status.F90
 +subroutine. Due to its frequent closing,
 +it can indicates precisely where a crash just happened.
 +( Note : if the status file is situated on a disk that is "local" to
 +the cpu where the job is run,
 +the whole operation is usually less than 0.2 msec. On a remote disk
 +(NFS), the operation is 10 times more consuming. These data may differ
 +from machine to machine : on a Cray-T3E the I/O operations are
 +relatively slow. In order not to cause troubles, in the default mode
 +the value of istatr is relatively large, causing the file not to be
 +often updated. On some machines and depending on the disk access,
 +using a small value of istatr -under 5- will cause the code to crash)
 +
 +The memory.F90 subroutine is a place where the memory space needed for the
 +code is estimated, shortly after reading the input.
 +The subroutine will immediately try to allocate as much as memory space
 +as estimated, and send an error message if not possible
 +(on the P6, this operation makes the job crash in case there is not enough
 +memory, but it is not difficult to understand what is the problem, thanks
 +to the status file). This memory estimation is hand-coded (this is very boring !).
 +The precise description of the allocation in the most critical
 +subroutines can constitute a help for the optimization of memory usage.
 +
 +Another help for the optimisation is provided by the time analysis backbone.
 +Many important routines are timed internally, thanks to two
 +calls to the timer routine timepw.F90 (one call at the entrance,
 +one call at the exit). A final call to the
 +routine timanalys.F90 provide a detailed analysis of the repartition of the
 +CPU and Wall clock time, in the critical subroutines, or in the
 +different levels of the code. Thanks to this tool, it is rather
 +obvious what parts of the code should be optimized, and also
 +what is going wrong when the code is ported to a new machine.
 +
 +The last feature useful for developing the code is provided by the
 +statistics of the make command, in the directory ~abinit.
 +This allows to make sure that no file is getting too big to be
 +easily manipulated, and indicate when a file is to be splitted
 +(see ~abinit/doc/developers/rules_coding).
 +
 +Of course, it is important that adequate care is taken to implement
 +these features in newly developed parts of the code.
 +It is thus expected that the developer read the
 +subroutines wrtout.F90, leave_new.F90, timepw.F90, and eventually
 +timanalys.F90, status.F90, and memory.F90. It is advised also to read
 +the subroutine getghc.F90 (the latter, for an idea of the usage of
 +the prtvol=-level debugging option, there for prtvol=-10).
 +
 +===== D. Utility subroutines =====
 +
 +Beyond the big main routines presented in section B, and the different
 +routines for timing, debugging and statistics of section C, other
 +routines in ~abinit/src may be worth to learn about...
 +
 +There is a whole set of routines for the treatment of strings
 +of characters (they should be described shortly in a next version):
 +appdig.F90, fappnd.F90, inarray.F90, incomprs.F90, inread.F90, inreplsp.F90,
 +inupper.F90, subchr.F90
 +
 +Routines for numerical derivation and/or integration :
 +ctrap.F90, der_int.F90
 +
 +Some Numerical functions
 +
 +  - besjm.F90 : half-integer bessel functions
 +  - derfc.F90 : complementary error function
 +  - invcb.F90 : fast computation of a series of inverse cubic roots
 +  - sincos.F90 : fast computation of a series of sine and cosine
 +
 +Routines related to symmetries and brillouin zone (should be described) :
 +chkgrp.F90, chkibz.F90, cnstti.F90, fixsym.F90, irrzg.F90, setsym.F90, strsym.F90,
 +sygrad.F90, symatm.F90, symchk.F90, symdet.F90, symg.F90, symrhg.F90, symzat.F90
 +
 +
 +Vector operations :
 +
 +  - norm.F90 : normalize a vector
 +  - normev.F90 : normalize a set of vectors, and fix the phases
 +  - fxphas.F90 : fix the phase of a vector
 +  - orthon.F90 : orthonormalize a set of vectors
 +  - projbd.F90 : orthogonalize one vector to a set of other vectors
 +  - sdirot.F90 : rotate a set of vectors by a unitary transformation
 +
 +3x3 matrix inversion (integer and real) :
 +mati3inv.F90, matr3inv.F90
 +
 +Other (should be described) :
 +clsopn.F90, fxphas.F90, hermit.F90, iseq.F90, isfile.F90, mkkin.F90,
 +mkrdim.F90, prmat.F90, randac.F90, xredxcart.F90
 +
 +===== E. Libraries =====
 +
 +As for the utility subroutines, the developer should be aware of the
 +routines available from the libraries, and use them instead of
 +coding something with the same purpose.
 +
 +The Lapack library contains a matrix diagonalizer, zhpev.F90, that
 +is needed many times in the code. Presently, this is the only
 +entry point in the Lapack library. The Blas routines are used
 +by Lapack, but are not directly called by ABINIT.
 +Only the specific subset of Lapack and Blas, needed to support
 +zhpev.F90, is present in ABINIT.
 +
 +The Numerical Recipes library contains:
 +
 +  - sorting routines (insort.F90, isort2.F90, sort2.F90)
 +  - a routine that computes the julian day number (julday.F90)
 +  - a function that returns a uniform random deviate between 0.0 and 1.0 (ran1.F90)
 +  - spline fitting routines (splfit.F90 and spline.F90)
 +  - (to be updated)