Introduction
Skeleton codes are bare-bones but fully functional PIC codes containing all the crucial elements but not the diagnostics and initial conditions typical of production codes. These are sometimes also called mini-apps. We are providing a hierarchy of skeleton PIC codes, from very basic serial codes for beginning students to very sophisticated codes with 3 levels of parallelism for HPC experts. The codes are designed for high performance, but not at the expense of code obscurity. They illustrate the use of a variety of parallel programming techniques, such as MPI, OpenMP, and CUDA, in both Fortran and C. For students new to parallel processing, we also provide some Tutorial and Quickstart codes.
The codes are available for download on the subpages here and on the GitHub Repository of the UCLA Plasma Simulation Group.
These codes have multiple purposes. The first is educational, by providing example codes for computational physics researchers of what a PIC code is and how to parallelize it with different parallel strategies and languages. The second is to provide benchmark codes for computer science researchers to evaluate new computer hardware and software with a non-trivial computation. Third, for researchers with already existing codes, these skeleton codes can be mined for useful ideas and procedures that can be incorporated into their own codes. Finally, the skeleton codes can be expanded and customized by adding diagnostics, initial conditions, and so forth, to turn them into production codes.
Types of available PIC codes
There are a variety of PIC codes in common use. They differ in the forces used, time scales supported, and physical and numerical approximations used. We currently provide skeleton codes for 3 different types of PIC codes. The electromagnetic codes contain the most complete physics, using the full set of Maxwell’s equations and relativity. The highest frequency waves in the model are light waves and this model requires the shortest time step to resolve them. The Darwin model uses the Darwin subset of Maxwell’s equations which neglects the transverse displacement current in Ampere’s law. This approximation permits transverse electric and magnetic fields generated by plasma currents but neglects retardation and light waves. The highest frequency modes in this model are plasma waves (or upper hybrid waves if an external magnetic field is present). Darwin models are sometimes called magneto-static, near-field. or radiation-less electromagnetic models. The time step used can be much larger than that used by the electromagnetic codes, so the codes can run faster. The electrostatic codes retain only the longitudinal electric field from Poisson’s equation and neglects all transverse electric and magnetic fields. The highest frequency and time step used is the same as in the Darwin code, but the code is much simpler and executes faster. Which code is appropriate depends on the problem being studied. Descriptions of the mathematics behind all three models are available here as PDF. Other types of skeleton codes will be developed in the future and we solicit contributions.
There are examples for 1, 2, and 3 dimensional codes. For simplicity, all codes are designed for uniform plasmas. They are divided into layers with increasing levels of parallelism. There are three types of codes in each layer, simple electrostatic codes as well as more complex electromagnetic and Darwin codes. All codes use the same algorithms, so if one’s interest is primarily in the algorithms, the electrostatic codes are recommended. The codes are executing the same calculation for each level of parallelism, but the execution is done differently. Each level of parallelism incrementally increases the complexity, and enables the program to execute faster. This hierarchy of codes could be used in a semester course on parallel PIC algorithms, so that a beginning student could become an HPC expert by the end. Spectral field solvers based on Fast Fourier Transforms (FFTs) are used, but the main focus of these skeleton codes is on processing particles, which normally dominates the CPU time.
Software Design
Most of the 2D and 3D codes are written in two languages, Fortran and C. The 1D code does not have C procedure libraries, but it does have C main programs that can call the Fortran procedure libraries. Additionally, some of the codes can be run interactively with an included Python script. Language interoperability is also maintained, so that Fortran codes can call C libraries and vice-versa. Fortran2003 has built-in support for interoperability with C. C procedures can be called by simply writing a file containing interface statements (similar to a header file in C). To allow C to call Fortran procedures, a simple wrapper library must be built in Fortran to allow interoperability with Fortran arrays (which is functionally similar to array objects in C++). Interoperability is important for a number of reasons. The first is to give Fortran access to features available only in C, for example, to permit a Fortran program to call a CUDA C procedure on a GPU. The second is to allow a C program access to procedures already available in Fortran. Our goal is to support multi-lingual codes, where researchers write in whatever language they are fluent in, and the pieces can be integrated seamlessly. Because the features of Fortran2003 are not yet well known in the scientific community, an older style of interoperability is currently provided in most cases. The details of this approach are discussed in Ref. [1]. Most of the libraries in the skeleton codes have interface functions to make them interoperable.
The computationally intensive procedures in Fortran are written in a Fortran77 style subset of Fortran90. This is primarily because programs written in such a style tend to execute faster. But an additional reason is that Fortran has a large legacy of codes written in this style and it is useful that students be aware of such codes. All libraries written in such a style have a corresponding interface library to enable type checking when used with Fortran90. The C codes adhere to the C99 standard, primarily to make use of float complex types.
Most of the codes maintain a simple two level hierarchy, with a main code and an encapsulated library. In a some cases, additional layers are needed. One example is a library which makes use of MPI communications. In such a case, the MPI features are usually completely contained within the library and the MPI details (or MPI header files) are not exposed to other layers. This kind of isolation more easily enables integrating different programming layers, such as GPU and MPI programming.
A flat data structure is used, using only basic integer and floating point types, without the use of derived types or structs. This is primarily to expose as much as possible without hiding features in a multi-layered hierarchy. We believe this makes it easier to understand the changes made when increasing the levels of parallelism. But another reason is that the PIC community has not yet agreed on standard objects and it would be more difficult to reuse software from the skeleton codes if the types or structs we chose conflicted.
Future Work
There are 3 different topics we plan to study. One is to create advanced versions of some of these skeleton codes to include features such as dynamic load balancing. Another is to implement additional 3D versions of some of the skeleton codes. Finally, we plan to explore alternative programming paradigms, such as OpenCL, OpenACC, or co-array Fortran, perhaps with other partners.
Support
This work is also supported in part by the National Science Foundation, Department of Energy SciDAC program and the UCLA Institute for Digital Research and Innovation (IDRE). It is also closely co-ordinated with the activities of the NSF funded Particle-in-Cell and Kinetic Simulation Center (PICKSC) at UCLA.
References
[1] Viktor K. Decyk, “A Method for Passing Data Between C and Opaque Fortran90 Pointers,” ACM Fortran Forum, vol. 27, no. 2, p. 2 (2008). doi link
[2] P. C. Liewer and V. K. Decyk, “A General Concurrent Algorithm for Plasma Particle-in-Cell Codes,” J. Computational Phys. 85, 302 (1989). doi link
[3] V. K. Decyk, “Skeleton PIC Codes for Parallel Computers,” Computer Physics Communications, 87, 87 (1995). doi link
[4] Viktor K. Decyk and Tajendra V. Singh, “Particle-in-Cell algorithms for emerging computer architectures,” Computer Physics Communications 185, 708 (2014). doi link, PDF