Compiling Zeus-MP under linux

 

Compiling Zeus-MP under linux

References

General Notes

  • As always, try a 'make clean' before 'make compile'
  • This is particular important for this complex system.
  • Note the default target is NOT compile, so ensure you say 'make compile'

Automatic Fixes

Fix makefile

  • perl -pi.bak -e 's/-64//' src/Makefile
  • perl -pi.bak -e 's/-r10000//' src/Makefile
  • perl -pi.bak -e 's/-OPT:IEEE_arithmetic=3//' src/Makefile
  • perl -pi.bak -e 's/-OPT:roundoff=3//' src/Makefile
  • perl -pi.bak -e 's/ips4//' src/Makefile
  • ./addMpiInclude.pl src/Makefile

Fix quotes in comments

  • ./removeQuotesFromComments.pl src/zeusmp.F

Manual Changes

Fails at syntax error in zeusmp.f (from zeusmp.F).

  • add -traditional to Makefile ZMP_CPP

CPP

cpp incorrectly thinks cpp means "C++" but should mean c pre-processor so modify filename and all references to it

  • perl -pi.bak -e 's/mgmpi.cpp/mgmpi.cpreprocess/g' *
  • mv mgmpi.cpp mgmpi.cpreprocess

Compile error re mpi_bcast

F77 MPICH FAQ at http://www-unix.mcs.anl.gov/mpi/mpich/docs/faq.htm#f77global and adds -Wno-globals to prevent error "Argument #1 of `mpi_bcast' is one type at (2) but is some other type at (1) "

  • add -Wno-globals to Makefile ZMP_OPTS

Implicit None

mgmpi.f doesn't compile since mgmpi.F (it's `source') has IMPLICIT NONE as non-first line in FUNCTION or SUBROUTINE manually change each instance (possible to automate).

Declarations

nudt.f line 201 v 386 : Invalid declaration of or reference to symbol `dtcs' at (2) [initially seen at (1)]

  • comment out 2 lines in this file as dtcs is declared twice.

-- but better to comment out lines 131 & 133 in nudt.F instead, ie: REAL dtcs, dtv1, dtv2, ... REAL dtnew

ZMP_CC & .c.0 target: ZMP_CC already contains arguments - use these only by commenting out other ones.

  • manually edit Makefile to remove second arguments

FFTW (Fastest Fourier Transform in da West)

  • Download from http://www.fftw.org/
  • Compile with MPI support (this is required)
  • Manually change makefile to point to it

MPICH & FFTW installation - cchung added 20/5/04

  • MPICH should be installed before building fftw

- should install mpich in a different directory from where the source code is. ie when configuring, do ./configure --prefix=/PATHNAME/

  • must use fftw 2.1.5 because version 3 has no mpi support when building fftw, type ./configure --enable-mpi --prefix=/PATHNAME/ (optional)
  • before doing make, do
  • setenv MPILIBS=.../mpich-x.x.x/lib (the source code directory, not the directory in which you just installed it)
  • setenv MPICC=../mpich-x.x.x/bin/mpicc

-m in Makefile

Need remove -m in compile line because: f77 says "Options starting with -g, -f, -m, -O or -W are automatically passed on to the various sub-processes invoked by f77" which means the following -o is no treated correctly.

  • remove -m from ZMP_CC
  • also remove -O3 and -m from ZMP_OPTS

MPI and MPICH

Manually change Makefile to use mpif77 (not f77) which will include correct libraries.

  • Got mpich version 1.2.5 and built it.
  • Change ZMP_LDR to ${HOME}/programs/mpich/bin/mpif77 (which is my local copy)
  • Change ZMP_LIB to use -lmpich (not -lmpi)
  • Manually change Makefile to use mpif77 (not f77) which will include correct libraries.
  • Install HDF4.1r5-linux.tar from binary distribution.
  • Modify Makefile to include it.

MSDOS is crap

MSDOS 'cr' character in gpbv.F This file has #include "zeusmp.F" as its first line. Instead of a newline (ascii 10) there is a dos-style cr (ascii 13). cpp cannot handle this and fails to include the file. *dos2unix gpbv.F

Now it compiles!

... or does it?

cchung - ran into some problems that brett didn't mention (20/5/04)

undefined reference to poisson_solver_, create_plan_

If these errors occur,

    • edit fftw_ps.c and add an extra underscore to the poisson_solver_ function declaration, ie void poisson_solver__(...)
    • edit fftwplan.c and do the same with create_plan_ void create_plan__(....)

zeusmp.def

Also before compiling, make sure to put a proper zeusmp.def and zmp_inp in /src/ and /exe/ respectively (test files are in /test/). Cannot use the sample zeusmp.def that's initially in /src/.

There are still some warnings

Modified checkin.c to use fd_set for select, as function requires.

Running a blast test

Follow http://zeus.ncsa.uiuc.edu/zmp_guide/zmp_user_guide4.html

  • cd zeus
  • cp test/zmp_inp.blast.xyz exe/zmp_inp
  • cp test/zeusmp.def.blast.xyz src/zeusmp.def
  • cd src
  • make clean
  • make compile
  • cd ../exe
  • mpirun -np 4 zeusmp.x

It runs! If it hangs at this point, check mpirun AND mpif77 as well as if ssh/rsh is working.

Processing blast test

Yes, there's more. We need to postprocess the data to stick it all together. in zeus/pp:

Create makefile

Create a makefile using

  • ./Make_zmp_pp

Use DEC as the template (quite close to linux)

Hand-modify makefile

(We COULD modify Make_zmp_pp script and add linux in order to make these changes.)

  • Add -template to /lib/cpp (see above)
  • Remove unknown flags for f77 in ZMP_OPTS such as -mips and -r10000 (these are architecture specific or actually chipsets I reckon)
  • Add -ljpeg -lz as required from -ldf (see above for HDF -ldf)

Alternatively, if that doesn't work, do

  • setenv ZMP_MACH dec
  • setenv ZMP_LIB ".../hdf/lib -ldf -ljpeg -lz"

Fix source code

  • Removed duplicate declaration (eg i,j,k) from hsplice.F

Now it compiles

To use the newly compiled zmp_pp.x, we follow http://lca.ncsa.uiuc.edu/zmp_guide/zmp_user_guide5.html to hdfaa.0??. Rename this aa005.hdf and view it as a HDF file.

Success! (for now...)

(Note this file DOES NOT contain range (max/min) data for its datasets.)

Fixing gravity problems

The gravitational collapse test does indeed collapse, but via a SEGFAULT!

Using gcol.xyz test Initially SEGFAULT upon run.

Made into single run by modifying input and compile to create a single 64x64x64 box. This runs OK and gives reasonable results.

Try to fix segfault:

Maybe a different version of FFTW is required. DavidB? thinks not likely (tried 2.1.2 and failed to fix) Maybe gravity.f:418 is to blame; this is the only test with gravity (yes - it is gravity.f) Using rtp instead of xyz (is no help)

Try FFTW libs

Tried using FFTW poisson-solver for self-gravity (by define GRAV_FFTW in zeusmp.def instead of GRAV). Had to change a few underscores in gravity.f, etc. Output said * self-gravity ON (normal) * Poisson solver FFTW (newly added) segfaults after a few seconds (ie differently to without these changes) in gravity.f

Investigate more!

Adding gravity to jet problem makes it segfault too. Therefore almost certainly gravity is the problem

Increased NTIMER in mgmpi.F (no change)

Double #define array sizes (i,j,k) in case of minor bug - (no change).

Decrease #define precision to float values 1.0 e29 (no)

Try debug mode : last message is "GRAVITY: Calling mmMgSolve". This is in gravity.f, as you might bloody well expect.

Used gdb and prints to find that segfault is in mg3p, called from gravity.f.

Segfault is at MPI_BCAST of ioutput variable (which is 0 anyway, unless you ask of verbose output).Commenting out this line solves SEGFAULT, with (I think) no bad effects.Can run gcollapse OK now.

Fix SEGFAULT

Can run jet with self-gravity, if znp_inp is modified to i,j,k to 31,31,127 (ie -1) since zeus says 'i,j,k must be divisible by blah blah blah'. Output is very turbulent. Clearly DIFFERENT with self-gravity and density collects together.

Although this fix is clearly not the root cause, it works so we stop here. SEGFAULT is probably in MPI_BCAST code (since single node doesn't fault).

-- BrettBeeson - 05 Sep 2003