The Game of Life
Here we've designed and implemented Conway's Game of Life, game algorithm; invented by John Conway in 1970. The rules of this 0-player game (a video game playing without user input) are:
- Any live cell with fewer than two live neighbors dies, as if by under-population.
- Any live cell with two or three live neighbors lives on to the next generation.
- Any live cell with more than three live neighbors dies, as if by overpopulation.
- Any dead cell with exactly three live neighbors becomes a live cell, as if by reproduction.
The motivation for this project was to become familiar with parallel computing primarily on the CPU, but also on the GPU which utilizes the stream processing paradigm.
Usage
There are 6 different implementations:
- Serial
- Serial with OpenMP
- MPI
- MPI with OpenMP (aka Hybrid)
- NVidia CUDA using GPU's global memory
- NVidia CUDA using GPU's shared memory
C and C++ are the programming languages used throughout. Out of those 6 only the last two are GPU accelerated. The first 4 are using the CPU in its entirety.
There lies complete documentation with esoteric programming details in Greek in Game_ofLife_M1485 pdf document. If there is a demand I will think about translating the project in English. However, I will mention enough key points in this document in my attempt to cover most, if not all, of the critical stuff. Furthermore, the code is thoroughly documented. Of course, if there's anything else you wish feel free to leave your comments below, or contact me.
I have first implemented the basic algorithm using the naive - serial CPU implementation. Then moved on to parallel CPU code. It was more intuitive this way and I was learning the OpenMP, MPI and CUDA APIs as I was going along. It was my first excursion into MPI and GPU CUDA territories.
Use the makefile either on *nix or Windows to build whichever version you want. Open up a command terminal and:
- make all : builds everything
- make serial : builds serial version
- make serial_display : builds serial version with results display
- make serial_openmp : builds openmp version
- make serial_openmp_display : builds openmp version with results display
- make mpi : builds mpi version
- make mpi_openmp : builds mpi with openmp
- make mpi_openmp_display
- make cuda : builds CUDA
- make cuda_shared : builds CUDA shared memory version
- make clean : clean built files
The *_display variants show the results during game play, i.e. display the evolution of the grid every generation, while the game is running.
Example screenshot:
For reference we use the ASCII .
to indicate a dead cell and the #
to indicate a live cell.
Each line is a line of the grid and each column is the column of the 2x2 grid. Ghost rows and columns are not shown of course, they are taken in consideration when evaluating a cell (i.e. a grid/array element) on the boundaries of the grid. In those ghost cells the respective elements from the opposite side of the array are copied to evaluate the status of the current cell. It works intuitively - exactly as you think it does:
Mathematically this is the Moore neighborhood of a point (@ C) and its 8 neighbors. For example when we're considering C to be grid[2][1023] and the grid has 1024 columns then NE, E and SE will be ghost cells formed by a copy of the contents of grid[1][0], grid[2][0] and grid[3][0] respectively.
Platform
Bear note that the platform I used was my computer running Windows 8.1 x86_64 Dual Booted with Linux 16.04.
For the MPI specification I used MPICH 3 which is *nix only. The serial and CUDA versions work on either platform, provided you've set up the environment. For windows there is a Visual Studio 2017 solution provided in the /cuda folder.
The computer used a single Intel Core i7 3770 CPU (4 physical cores, 8 if you count Intel's hyperthreading) and the NVidia GeForce GTX 660 GPU (Cuda Compute Capability 3.0). I have included datasheets with analytical details of the hardware, which you can find in the respective files.
Later I used the same programs to test against the clustered Parallel Computer provided by the University (Athens, Department of Informatics & Telecommunications - DIT) - its specs were not disclosed (??).
Measurements
You can find detailed analysis and instrumentation results on the file “Game_of_Life_M1485_metrics.xlsx”.
I compare and contrast the results between my own computer and the University's computer. Measurements are taken regarding parallel speedup and efficiency (%) given number of assigned CPU cores - computer resources, number of elements (rows & columns) - problem size. We have used 10,000 generations. A generation is an instance of the grid as it develops given an input randomly populated grid (for all intents and purposes it is the field of the game at a certain point in time).
We perform these measurements twice, each time from the perspective of a different law.
Firstly using Amdahl's law and secondly using Gustafson's law.
- Amdahl's Law aka Strong scaling maintains the size of the input steady, while increasing the amount of computing units. Thus we compare performance while increasing the CPUs (or Cores) assigned to process the problem
- Gustafson's Law aka Weak scaling “spreads” an N-times larger problem using N computing units.
Strong Scaling performance graph using the University's computer:
Github
Github repository link.
Acknowledgments
University of Athens, Greece
Department of Informatics and Telecommunications
Parallel Computing Systems Class of 2017