developers:programmer_guide
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
developers:programmer_guide [2014/01/22 15:52] – Yann Pouillon | developers:programmer_guide [2024/09/02 16:21] (current) – Maryam Azizi | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | <WRAP important> | ||
+ | ===== Programmer' | ||
+ | |||
+ | NOTE: this file has NOT yet been updated for the response function features... | ||
+ | |||
+ | NOTE: this file has NOT yet been updated for structured datatypes ... | ||
+ | |||
+ | This is a brief programmer guide describing the structure of the main program (abinit) of the ABINIT package. | ||
+ | It is intended to provide some introductory guidance to programmers who may want to modify parts of the code. | ||
+ | You will find the code fairly well commented and should explore it to get more details than provided below. | ||
+ | |||
+ | The reader is assumed to have already gone through the latest version of the following files: | ||
+ | |||
+ | * ~abinit/ | ||
+ | * ~abinit/ | ||
+ | * ~abinit/ | ||
+ | |||
+ | It is important that the reader know how to compile the code, and how to run tests. | ||
+ | This is described in detail in the installation notes on the Web, that can also be found in the ~abinit/ | ||
+ | From now on, we assume that you are sufficiently familiarized with these different points, and that you have sufficient experience in the use of ABINIT. | ||
+ | |||
+ | The ABINIT group rules for coding in Fortran 90 are detailed in the ~abinit/ | ||
+ | These rules are mostly code-independent. | ||
+ | Here we describe facts related specifically to the ABINIT package. | ||
+ | |||
+ | In order to allow programmers to develop different parts of the code at the same time, while avoiding synchronisation problems, a few rules have been sketched. | ||
+ | See the ~abinit/ | ||
+ | |||
+ | Specific facts related to parallelism in abinit are explained in ~abinit/ | ||
+ | |||
+ | Structure of the present file: | ||
+ | |||
+ | (A) A few facts | ||
+ | (B) The skeleton of the code. | ||
+ | (C) Debugging, timing and statistics facilities. | ||
+ | (D) Utility subroutines. | ||
+ | (E) Libraries. | ||
+ | |||
+ | ===== (A) A few facts, useful to know ===== | ||
+ | |||
+ | The main routine is called abinit.F90 and is present in the directory ~abinit/ | ||
+ | The rest of the subroutines called by abinit.F90 are in the other ~abinit/src or ~abinit/lib directories. | ||
+ | Other main routines (mrgddb.F90, | ||
+ | |||
+ | The subroutines are splitted in two parts: those that come from | ||
+ | other packages, like Blas, Lapack, and other numerical routines or IO routines; | ||
+ | and those that have been written directly for ABINIT. | ||
+ | The first ones are found in the ~abinit/lib directories, | ||
+ | are often written in Fortran77 or C, and do not follow the coding rules of the ABINIT | ||
+ | project. The second ones are written in Fortran90 (there are two C routines, | ||
+ | for timing purposes), follow the coding rules, and are found in the | ||
+ | different ~abinit/src directories. At the time of writing, there are more | ||
+ | than 300 subroutines, | ||
+ | |||
+ | Machine dependency is accomodated by using the C preprocessor (CPP) on ALL files in ~abinit/ | ||
+ | so the fortran compilations are conducted by first preprocessing every file then passing the | ||
+ | result to the compiler. See the ~abinit/ | ||
+ | directory and the ~abinit/ | ||
+ | The sequential and parallel versions are also produced by c preprocessing | ||
+ | a unique source file. The routines that differ in the sequential and | ||
+ | parallel versions of the code are found in the Src_seqpar | ||
+ | and Src_basis directory. | ||
+ | Other Src_* directories contains separately the routines for XC treatment, | ||
+ | pseudopotential input, parsing of input file, those for the anaddb code, | ||
+ | and all the remaining (common) sources. | ||
+ | |||
+ | ===== (B) The skeleton of the code ===== | ||
+ | |||
+ | (to be updated for RF features) | ||
+ | One can distinguish 10 important routines, called levels : | ||
+ | |||
+ | (1) abinit | ||
+ | (2) driver | ||
+ | (3) gstate | ||
+ | (4) mover | ||
+ | (5) scfcv | ||
+ | (6) vtorho | ||
+ | (7) vtowfk | ||
+ | (8) cgwf | ||
+ | (9) getghc | ||
+ | |||
+ | The routine abinit.F90 calls driver.F90, driver.F90 calls gstate.F90, ... | ||
+ | |||
+ | |||
+ | ==== B.1. abinit.F90 ==== | ||
+ | |||
+ | The main routine, abinit, level 1, has the aim of reading completely | ||
+ | the input files, and checking whether the input variables | ||
+ | are sensible, and whether the available memory is sufficient. | ||
+ | These operations should be very fast, so that the user is | ||
+ | quickly warned whether his/her input are incorrect. | ||
+ | No big array is allocated in level 1, except for testing purposes. | ||
+ | In detail, abinit.F90 performs, or calls routines that perform: | ||
+ | |||
+ | - Eventually initialize MPI (for parallel runs) | ||
+ | - Initialize overall timing of run | ||
+ | - Print greeting for interactive user | ||
+ | - Read names of files (input, output, rootinput, rootoutput, roottemporaries), | ||
+ | - Create the name of the status file, initialize the status subroutine. | ||
+ | - Open output file and print herald at top of output and log files | ||
+ | - Read the input file, and store the information in a long string of characters | ||
+ | - Take ndtset and ntypat from the input string, then allocate the arrays whose dimensions depends only on ndtset, ntypat and msym. | ||
+ | - Finish to read the " | ||
+ | - Continue to analyze the input string, and allocate the arrays needed for input. | ||
+ | - Provide defaults for the variables that have not yet been initialized. | ||
+ | - Call the main input routine, and finish the input variable initialisation. | ||
+ | - Echo input data to output file and log file | ||
+ | - Perform additional checks on input data | ||
+ | |||
+ | At this stage, all the information from the " | ||
+ | have been read and checked. | ||
+ | |||
+ | - Perform main calculation | ||
+ | - Give final echo of coordinates, | ||
+ | - Timing analysis | ||
+ | - Delete the status file, and, for build-in tests, analyse the correctness of results | ||
+ | - Write the final timing, close the output file, and write a final line to the log file | ||
+ | - Eventual cleaning of MPI run | ||
+ | |||
+ | |||
+ | ==== B.2. driver.F90 ==== | ||
+ | |||
+ | In driver, level 2, a loop on the data sets is present. | ||
+ | For each data set, either the ground state subroutine (gstate.F90) or | ||
+ | the response function subroutine (respfn.F90) is called (to be described | ||
+ | in a future version of this file). | ||
+ | A few big arrays are allocated at that level. | ||
+ | |||
+ | |||
+ | ==== B.3. gstate.F90 ==== | ||
+ | |||
+ | The routine gstate.F90 , level 3, performs a variety of initialisation tasks | ||
+ | and result analysis, for which different routines are called. | ||
+ | |||
+ | A variety of arrays are computed or initialized in subroutine | ||
+ | setup1.F90 . Then, based on the geometrical data input and the | ||
+ | values of k points and ecut, the basis sphere of planewaves is computed | ||
+ | by kpgio.F90 (that calls kpgsph.F90 and boundy.F90). | ||
+ | Then header information is written to a file, for | ||
+ | use in constructing output wavefunction files, see headwr.F90 . These wf files | ||
+ | contain a description of various input settings which were used to create | ||
+ | them. Other routines related to the header, and used later, | ||
+ | are headcopy.F90 , headck.F90 , and headlv.F90 , as well as pspini.F90 and clnup1.F90 . | ||
+ | |||
+ | Next, all the pseudopotential files needed for the calculation are read. | ||
+ | Subroutine pspini.F90 controls this part, and calls, for each atom type, the | ||
+ | routine pspatm.F90 , that will call different routines | ||
+ | (psp1in.F90, | ||
+ | to the pseudopotential file format. | ||
+ | Various transforms of the input psp data are taken | ||
+ | (bessel function transforms) relevant to the local and nonlocal parts of | ||
+ | the potential (psp1lo.F90, | ||
+ | psp5lo.F90, psp5nl.F90) and the non-linear XC core-correction (psp1cc.F90, | ||
+ | psp4cc.F90, and psp6cc.F90) . | ||
+ | |||
+ | The wavefunctions are initialized (read or set to random numbers) in inwffil.F90 . | ||
+ | If they are to be initialized at random, | ||
+ | or if the simple reading of an existing wf file is needed, then the | ||
+ | routine initwf.F90 is called by inwffil.F90 . If some work | ||
+ | has to be done of the existing wavefunctions, | ||
+ | the operation is more delicate, and inwffil.F90 needs to call the routine | ||
+ | newkpt.F90, that calls different other routines : | ||
+ | |||
+ | - listkk.F90 (to find the closest k point) | ||
+ | - kpgsph.F90 (generate list of plane waves) | ||
+ | - sphere.F90 (to translate plane wave coefficients from one cut-off sphere to another) | ||
+ | - envlop.F90 (multiply by an envelope the random coefficient, | ||
+ | - orthon.F90 (to orthonormalize the wavefunctions) | ||
+ | |||
+ | The symmetries are initialized in setsym.F90, | ||
+ | occupation numbers might be computed in newocc.F90 if needed. | ||
+ | Then, the code computes a starting density and screening potential, | ||
+ | either from the existing wavefunctions (mkrho.F90), | ||
+ | of the pseudopotential (initro.F90), | ||
+ | At this point, the code either pursues a fixed atom calculation (level 6) | ||
+ | or a moving atom calculation (levels 4 or 5). | ||
+ | |||
+ | After these calls, when the big calculations are done, gstate.F90 continues | ||
+ | by printing results, closing files, deallocating arrays, and return the | ||
+ | control to driver.F90 | ||
+ | |||
+ | ==== B.4. mover.F90 ==== | ||
+ | |||
+ | For fixed atoms (ionmov=0), gstate.F90 calls directly scfcv.F90 | ||
+ | (self-consistent field convergence) ; | ||
+ | For movement of ions (ionmov> | ||
+ | The routine mover.F90 receives an initial set of atom positions | ||
+ | and cell parameters and finish providing a new set after (ntime) | ||
+ | iterations (if convergence is needed and never reached) | ||
+ | The table of contents of the routine follows: | ||
+ | |||
+ | 01. Initialization of indexes and allocations of arrays | ||
+ | 02. Particularities of each predictor | ||
+ | 03. Set the number of iterations ntime | ||
+ | 04. Try to read history of previous calculations | ||
+ | 05. Allocate the hist structure | ||
+ | 06. First output before any itime or icycle | ||
+ | 07. Compute xcart and fill the history of the first SCFCV | ||
+ | 08. Loop for itime (From 1 to ntime) | ||
+ | 09. Loop for icycle (From 1 to ncycles) | ||
+ | 10. Output for each icycle (and itime) | ||
+ | 11. Symmetrize atomic coordinates over space group elements | ||
+ | 12. => Call to SCFCV routine and fill history with forces | ||
+ | 13. Write the history into the _HIST file | ||
+ | 14. Output after SCFCV | ||
+ | 15. => Test Convergence of forces and stresses | ||
+ | 16. => Precondition forces, stress and energy | ||
+ | 17. => Call to each predictor | ||
+ | 18. Use the history | ||
+ | 19. End loop icycle | ||
+ | 20. End loop itime | ||
+ | 21. Set the final values of xcart and xred | ||
+ | 22. XML Output at the end | ||
+ | 23. Deallocate hist and ab_mover datatypes | ||
+ | |||
+ | The main predictors are described with ionmov value: | ||
+ | |||
+ | 1. Molecular dynamics without viscosity (vis=0) | ||
+ | 1. Molecular dynamics with viscosity (vis/=0) | ||
+ | 2. Broyden-Fletcher-Goldfard-Shanno method (forces) | ||
+ | 3. Broyden-Fletcher-Goldfard-Shanno method (forces,Tot energy) | ||
+ | 4. Conjugate gradient of potential and ionic degrees of freedom | ||
+ | 5. Simple relaxation of ionic positions | ||
+ | 6. Verlet algorithm for molecular dynamics | ||
+ | 7. Verlet algorithm blocking every atom where dot(vel, | ||
+ | 8. Verlet algorithm with a nose-hoover thermostat | ||
+ | 9. Langevin molecular dynamics | ||
+ | 10. BFGS with delocalized internal coordinates | ||
+ | 11. Conjugate gradient algorithm | ||
+ | 12. Isokinetic ensemble molecular dynamics | ||
+ | 13. Isothermal/ | ||
+ | 14. Symplectic algorithm Runge-Kutta-Nyström SRKNa14 | ||
+ | 20. Ionic positions relaxation using DIIS | ||
+ | 21. Steepest descent algorithm | ||
+ | 30. Self consistent phonon structure using a supercell | ||
+ | |||
+ | In all three cases, the routines call many times scfcv.F90, which | ||
+ | controls update and mixing of the density and potential and generates | ||
+ | forces for a given arrangement of atoms. With these data, the | ||
+ | molecular dynamics or the geometry optimization can be performed. | ||
+ | |||
+ | ==== B.5. scfcv.F90 ==== | ||
+ | |||
+ | This routine performs the SCF loop. | ||
+ | A few arrays, needed for that purpose, are allocated there. | ||
+ | Inside the loop, scfcv.F90 calls: | ||
+ | |||
+ | - setvtr.F90, usually only at the initialisation step, to set a first trial potential | ||
+ | - vtorho.F90, level 7, to get the density from the trial potential | ||
+ | - vresfo.F90, to get the potential residual, the forces, and components of the energy | ||
+ | - newvtr.F90, to precondition the potential residual, and compute the new trial potential | ||
+ | |||
+ | The computation of Hartree and XC potential is done inside setvtr.F90 and | ||
+ | newvtr.F90, by calling the routine rhohxc.F90 . | ||
+ | That routine, in turn, calls hartre.F90 , for the Hartree potential, and | ||
+ | many different routines for the XC potential, depending on the | ||
+ | different XC functionals, | ||
+ | xcpot.F90, xcpzca.F90, xcspol.F90, xctetr.F90, xcwign.F90, xcxalp.F90) | ||
+ | |||
+ | After the loop, scfcv.F90 computes the stress, by calling stress.F90 , | ||
+ | and also eventually print density, potential, or other files. | ||
+ | |||
+ | ==== B.6. vtorho.F90 ==== | ||
+ | |||
+ | Subroutine vtorho.F90 (potential -v- to density -rho-, level 7) | ||
+ | produces the density in a fixed potential, | ||
+ | by summing all contributions of different k points and eventually | ||
+ | different spins. | ||
+ | Parallelism is implemented at the level of concurrent | ||
+ | treatment of each k-point separately, in vtorho.F90 . | ||
+ | |||
+ | ==== B.7. vtowfk.F90 ==== | ||
+ | |||
+ | Subroutine vtowfk (potential -v- to k-point wavefunctions, | ||
+ | is called to improve the wavefunctions | ||
+ | over all bands at a single k point at a time. It gives also | ||
+ | the contributions of each band to kinetic energy and non-local | ||
+ | energy. In the case of fixed occupancies, | ||
+ | contribution of each k point to the density. | ||
+ | Subspace diagonalization, | ||
+ | is done within vtowfk, and might be time-consuming. | ||
+ | |||
+ | ==== B.8. cgwf.F90 and getghc.F90 ==== | ||
+ | |||
+ | Subroutine cgwf (Conjugate-gradient on the wavefunctions, | ||
+ | runs the iterative optimization of wavefunction | ||
+ | for a single band and k point, in a fixed potential. | ||
+ | It start from an existing wavefunction, | ||
+ | on a temporary file on disk, and refine it, finally writing in central | ||
+ | memory or on another temporary file on disk. | ||
+ | Deep within cgwf is a call to getghc (level 10), | ||
+ | which computes < | ||
+ | wavefunction. | ||
+ | is presently dominated by fft calls (about 50-60%), with the next | ||
+ | bottleneck being the nonlocal operator (20-35%). | ||
+ | |||
+ | You will find the code fairly well commented and may explore it further | ||
+ | to get more details than provided above. | ||
+ | |||
+ | ===== C. Debugging, timing and statistics facilities ===== | ||
+ | |||
+ | The abinit code has been equipped with a set of tools for the developers. | ||
+ | These include: | ||
+ | |||
+ | 1) The log file. | ||
+ | 2) The prtvol input variable. | ||
+ | 3) The status file. | ||
+ | 4) The memory subroutine. | ||
+ | 5) The time analysis backbone. | ||
+ | 6) The statistics provided in the make. | ||
+ | |||
+ | As mentioned in the new_user_guide or in the abinit_help, | ||
+ | there are two general files for output: the " | ||
+ | the " | ||
+ | causing the code to crash, the log file will mention the name | ||
+ | of the routine where something went wrong, and what went wrong. | ||
+ | Usually, corrective actions are suggested. | ||
+ | The output of messages is handled thanks to wrtout.F90 , called when | ||
+ | the message has been packed in a character string (usually called ' | ||
+ | When something has gone wrong, the exit is to be done by a call to the | ||
+ | leave_new.F90 subroutine. | ||
+ | |||
+ | The use of the prtvol input variable in conjunction with | ||
+ | the log file is the most important tool for debugging. | ||
+ | As indicated by its name, prtvol controls the print volume in the | ||
+ | output file and in the log file. | ||
+ | When equal to 0, the information in the log file is kept at the minimum. | ||
+ | When equal to 1, the information is already much more complete. | ||
+ | Even much more flexibility is gained when prtvol is used with negative values. | ||
+ | These negative values each refer to one the levels of the code | ||
+ | (i.e. prtvol=-10 refer to debugging of getghc, prtvol=-7 to debugging | ||
+ | of vtorho). When debugging some level through the use of the corresponding | ||
+ | negative prtvol value, the amount of data written on the log file, | ||
+ | coming from this level of the code, will increase dramatically. Moreover, | ||
+ | after the first execution of the complete level, the code will automatically | ||
+ | stop (except for the levels 1 and 3, that stop BEFORE entering | ||
+ | the next level). | ||
+ | |||
+ | The status file is another important tool for debugging, especially | ||
+ | because of the UNIX pecularity (when running from a script) | ||
+ | that the outputs are not immediately | ||
+ | written in a file, but kept in a buffer, unless this file is closed. | ||
+ | When the code crashes (for example with a message " | ||
+ | it is difficult to know at which place the segmentation fault happened, | ||
+ | from the log file. | ||
+ | The status file is a very short file that, depending on the | ||
+ | value of the input parameter istatr, can be opened, rewound, written, | ||
+ | and closed very frequently. This is done by calls to the status.F90 | ||
+ | subroutine. Due to its frequent closing, | ||
+ | it can indicates precisely where a crash just happened. | ||
+ | ( Note : if the status file is situated on a disk that is " | ||
+ | the cpu where the job is run, | ||
+ | the whole operation is usually less than 0.2 msec. On a remote disk | ||
+ | (NFS), the operation is 10 times more consuming. These data may differ | ||
+ | from machine to machine : on a Cray-T3E the I/O operations are | ||
+ | relatively slow. In order not to cause troubles, in the default mode | ||
+ | the value of istatr is relatively large, causing the file not to be | ||
+ | often updated. On some machines and depending on the disk access, | ||
+ | using a small value of istatr -under 5- will cause the code to crash) | ||
+ | |||
+ | The memory.F90 subroutine is a place where the memory space needed for the | ||
+ | code is estimated, shortly after reading the input. | ||
+ | The subroutine will immediately try to allocate as much as memory space | ||
+ | as estimated, and send an error message if not possible | ||
+ | (on the P6, this operation makes the job crash in case there is not enough | ||
+ | memory, but it is not difficult to understand what is the problem, thanks | ||
+ | to the status file). This memory estimation is hand-coded (this is very boring !). | ||
+ | The precise description of the allocation in the most critical | ||
+ | subroutines can constitute a help for the optimization of memory usage. | ||
+ | |||
+ | Another help for the optimisation is provided by the time analysis backbone. | ||
+ | Many important routines are timed internally, thanks to two | ||
+ | calls to the timer routine timepw.F90 (one call at the entrance, | ||
+ | one call at the exit). A final call to the | ||
+ | routine timanalys.F90 provide a detailed analysis of the repartition of the | ||
+ | CPU and Wall clock time, in the critical subroutines, | ||
+ | different levels of the code. Thanks to this tool, it is rather | ||
+ | obvious what parts of the code should be optimized, and also | ||
+ | what is going wrong when the code is ported to a new machine. | ||
+ | |||
+ | The last feature useful for developing the code is provided by the | ||
+ | statistics of the make command, in the directory ~abinit. | ||
+ | This allows to make sure that no file is getting too big to be | ||
+ | easily manipulated, | ||
+ | (see ~abinit/ | ||
+ | |||
+ | Of course, it is important that adequate care is taken to implement | ||
+ | these features in newly developed parts of the code. | ||
+ | It is thus expected that the developer read the | ||
+ | subroutines wrtout.F90, leave_new.F90, | ||
+ | timanalys.F90, | ||
+ | the subroutine getghc.F90 (the latter, for an idea of the usage of | ||
+ | the prtvol=-level debugging option, there for prtvol=-10). | ||
+ | |||
+ | ===== D. Utility subroutines ===== | ||
+ | |||
+ | Beyond the big main routines presented in section B, and the different | ||
+ | routines for timing, debugging and statistics of section C, other | ||
+ | routines in ~abinit/src may be worth to learn about... | ||
+ | |||
+ | There is a whole set of routines for the treatment of strings | ||
+ | of characters (they should be described shortly in a next version): | ||
+ | appdig.F90, fappnd.F90, inarray.F90, | ||
+ | inupper.F90, | ||
+ | |||
+ | Routines for numerical derivation and/or integration : | ||
+ | ctrap.F90, der_int.F90 | ||
+ | |||
+ | Some Numerical functions | ||
+ | |||
+ | - besjm.F90 : half-integer bessel functions | ||
+ | - derfc.F90 : complementary error function | ||
+ | - invcb.F90 : fast computation of a series of inverse cubic roots | ||
+ | - sincos.F90 : fast computation of a series of sine and cosine | ||
+ | |||
+ | Routines related to symmetries and brillouin zone (should be described) : | ||
+ | chkgrp.F90, chkibz.F90, cnstti.F90, fixsym.F90, irrzg.F90, setsym.F90, strsym.F90, | ||
+ | sygrad.F90, symatm.F90, symchk.F90, symdet.F90, symg.F90, symrhg.F90, symzat.F90 | ||
+ | |||
+ | |||
+ | Vector operations : | ||
+ | |||
+ | - norm.F90 : normalize a vector | ||
+ | - normev.F90 : normalize a set of vectors, and fix the phases | ||
+ | - fxphas.F90 : fix the phase of a vector | ||
+ | - orthon.F90 : orthonormalize a set of vectors | ||
+ | - projbd.F90 : orthogonalize one vector to a set of other vectors | ||
+ | - sdirot.F90 : rotate a set of vectors by a unitary transformation | ||
+ | |||
+ | 3x3 matrix inversion (integer and real) : | ||
+ | mati3inv.F90, | ||
+ | |||
+ | Other (should be described) : | ||
+ | clsopn.F90, fxphas.F90, hermit.F90, iseq.F90, isfile.F90, mkkin.F90, | ||
+ | mkrdim.F90, prmat.F90, randac.F90, xredxcart.F90 | ||
+ | |||
+ | ===== E. Libraries ===== | ||
+ | |||
+ | As for the utility subroutines, | ||
+ | routines available from the libraries, and use them instead of | ||
+ | coding something with the same purpose. | ||
+ | |||
+ | The Lapack library contains a matrix diagonalizer, | ||
+ | is needed many times in the code. Presently, this is the only | ||
+ | entry point in the Lapack library. The Blas routines are used | ||
+ | by Lapack, but are not directly called by ABINIT. | ||
+ | Only the specific subset of Lapack and Blas, needed to support | ||
+ | zhpev.F90, is present in ABINIT. | ||
+ | |||
+ | The Numerical Recipes library contains: | ||
+ | |||
+ | - sorting routines (insort.F90, | ||
+ | - a routine that computes the julian day number (julday.F90) | ||
+ | - a function that returns a uniform random deviate between 0.0 and 1.0 (ran1.F90) | ||
+ | - spline fitting routines (splfit.F90 and spline.F90) | ||
+ | - (to be updated) |
developers/programmer_guide.txt · Last modified: 2024/09/02 16:21 by Maryam Azizi