Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors

Purpose

This recipe describes a step-by-step process of how to get, build, and run NAMD, Scalable Molecular Dynamic, code on Intel® Xeon Phi™ processor and Intel® Xeon® E5 processors for better performance.

Introduction

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecule systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Find the details below of how to build on Intel® Xeon Phi™ processor and Intel® Xeon® E5 processors and learn more about NAMD at http://www.ks.uiuc.edu/Research/namd/

Building and running NAMD on Intel® Xeon® Processor E5-2697 v4 (BDW) and Intel® Xeon Phi™ Processor 7250 (KNL)

Download the Code:

Download the latest “Source Code” of NAMD from this site: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
Download charm++ 6.7.1 version
- You can get charm++ from the NAMD “Source Code” of the “Version Nightly Build”
- Or download it separately from here http://charmplusplus.org/download/
Download fftw3 version(http://www.fftw.org/download.html)
- Version 3.3.4 is used is this run
Download apao and stvm workloads from here: http://www.ks.uiuc.edu/Research/namd/utilities/

Build the Binaries:

Recommended steps to build fftw3:
- Cd<path>/fftw3.3.4
- ./configure --prefix=$base/fftw3 --enable-single --disable-fortran CC=icc
  Use xMIC-AVX512 for KNL or –xCORE-AVX2 for BDW
- make CFLAGS="-O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits" clean install
Build multicore version of charm++:
- cd <path>/charm-6.7.1
- ./build charm++ multicore-linux64 iccstatic --with-production "-O3 -ip"

Build BDW:

Modify the Linux-x86_64-icc.arch to look like the following:

NAMD_ARCH = Linux-KNL
CHARMARCH = multicore-linux64-iccstatic
FLOATOPTS = -ip -xMIC-AVX512 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE
CXX = icpc -std=c++11 -DNAMD_KNL
CXXOPTS = -static-intel -O2 $(FLOATOPTS)
CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4
CXXCOLVAROPTS = -O2 -ip
CC = icc
COPTS = -static-intel -O2 $(FLOATOPTS)

./config Linux-x86_64-icc --charm-base <charm_path> --charm-arch multicore-linux64- iccstatic --with-fftw3 --fftw-prefix <fftw_path> --without-tcl --charm-opts –verbose
gmake -j

Build KNL:

Modify the arch/Linux-KNL-icc.arch to look like the following:

NAMD_ARCH = Linux-KNL
CHARMARCH = multicore-linux64-iccstatic
FLOATOPTS = -ip -xMIC-AVX512 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE
CXX = icpc -std=c++11 -DNAMD_KNL
CXXOPTS = -static-intel -O2 $(FLOATOPTS)
CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4
CXXCOLVAROPTS = -O2 -ip
CC = icc
COPTS = -static-intel -O2 $(FLOATOPTS)

./config Linux-KNL-icc --charm-base <charm_path> --charm-arch multicore-linux64-iccstatic --with-fftw3 --fftw-prefix <fftw_path> --without-tcl --charm-opts –verbose
gmake –j

Other system setup:

Change the kernel setting for KNL: “nmi_watchdog=0 rcu_nocbs=2-271 nohz_full=2-271” One of the ways to change the settings (this could be different for every system):
- First save your original grub.cfg to be safe
  cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg.ORIG
- In “/etc/default/grub”. Add (append) the below to the “GRUB_CMDLINE_LINUX”
  nmi_watchdog=0 rcu_nocbs=2-271 nohz_full=2-271
- Save your new configuration
  grub2-mkconfig -o /boot/grub2/grub.cfg
- Reboot the system. After logging in, verify the settings with 'cat /proc/cmdline’
Change next lines in *.namd file for both workloads:
     numsteps             1000
     outputtiming          20
     outputenergies     600

Run NAMD:

Run BDW (ppn = 72):
$BIN +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)
Run KNL (ppn = 136, MCDRAM in flat mode, similar performance in cache):
numactl –m 1 $BIN +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)

Example: numactl –m 1 /NAMD_2.11_Source/Linux-KNL-icc/namd2 +p 136 apoa1/apoa1.namd +pemap 0-135

Performance results reported in Intel Salesforce repository (ns/day; higher is better):

Workload	2S BDW 18c 2.3Ghz (ns/day)	KNL bin1 (ns/day)	KNL vs. 2S BDW (speedup)
stmv	0.45	0.55	1.22x
Ap0a1	5.5	6.18	1.12x

Systems configuration:

Processor	Intel® Xeon® Processor E5-2697 v4(BDW)	Intel® Xeon Phi™ Processor 7250(KNL)
Stepping	1 (B0)	1 (B0) Bin1
Sockets / TDP	2S / 290W	1S / 215W
Frequency / Cores / Threads	2.3 GHz / 36 / 72	1.4 GHz / 68 / 272
DDR4	8x16GB 2400 MHz(128GB)	6x16 GB 2400 MHz
MCDRAM	N/A	16 GB Flat
Cluster/Snoop Mode/Mem Mode	Home	Quadrant/flat
Turbo	On	On
BIOS	GRRFSDP1.86B0271.R00.1510301446	GVPRCRB1.86B.0010.R02.1608040407
Compiler	ICC-2017.0.098	ICC-2017.0.098
Operating System	Red Hat Enterprise Linux* 7.2 (3.10.0-327.e17.x86_64)	Red Hat Enterprise Linux 7.2 (3.10.0-327.22.2.el7.xppsl_1.4.1.3272._86_64)

Building and running NAMD for Cluster on Intel® Xeon® Processor E5-2697 v4 (BDW) and Intel® Xeon Phi™ Processor 7250 (KNL)

Build the Binaries:

Set Intel tools for compilation:

I_MPI_CC=icc;I_MPI_CXX=icpc;I_MPI_F90=ifort;I_MPI_F77=ifort
export I_MPI_CC I_MPI_CXX I_MPI_F90 I_MPI_F77
CC=icc;CXX=icpc;F90=ifort;F77=ifort
export CC CXX F90 F77
export I_MPI_LINK=opt_mt

Recommended steps to build fftw3:
- Cd<path>/fftw3.3.4
- ./configure --prefix=$base/fftw3 --enable-single --disable-fortran CC=icc
- Use xMIC-AVX512 for KNL or –xCORE-AVX2 for BDW
- make CFLAGS="-O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits" clean install
Recommended steps to build multicore version of charm++:
- cd <path>/charm-6.7.1
- chmod –R 777 *
- source /opt/intel/compiler/<version>/compilervars.sh intel64
- source /opt/intel/impi/<version>/bin/mpivars.sh
- ./build charm++ mpi-linux-x86_64 smp mpicxx ifort --with-production $base_charm_opts -DCMK_OPTIMIZE -DMPICH_IGNORE_CXX_SEEK
Build on KNL:
- ./config Linux-KNL-icc --charm-base < fullPath >/charm-6.7.1 --charm-arch mpi-linux-x86_64-ifort-smp-mpicxx --with-fftw3 --fftw-prefix <fullPath>/fftw3 --without-tcl --charm-opts –verbose
- cd “Linux-KNL-icc”
- gmake -j
Build on BDW:
- ./config Linux-KNL-icc --charm-base $FULLPATH/charm-6.7.1 --charm-arch mpi-linux-x86_64-ifort-smp-mpicxx --with-fftw3 --fftw-prefix $FULLPATH/fftw3 --without-tcl --charm-opts -verbose
- cd Linux-KNL-icc
- make clean
- gmake –j

Run the Binaries (ps: “hosts”: is the file that contains the host names to run on):

BDW run on single node:

export I_MPI_PROVIDER=psm2
export I_MPI_FALLBACK=no
export I_MPI_FABRICS=tmi

source /opt/intel/compiler/<version>/compilervars.sh intel64
source /opt/intel/impi/<version>/intel64/bin/mpivars.sh

NTASKS_PER_NODE=1
export MPPEXEC="time -p mpiexec.hydra -perhost $NTASKS_PER_NODE -f ./hosts "
$MPPEXEC -n $node $BINPATH/$BINNAME +ppn 71 $FULLPATH/$WORKLOAD +pemap 1-71 +commap 0

Example:
$MPPEXEC -n 1 $FULLPATH/namd2” +ppn 71 $FULLPATH/stmv/stmv.namd” +pemap 1-71 +commap 0

KNL Run on single node:

export I_MPI_PROVIDER=psm2
export I_MPI_FALLBACK=0
export I_MPI_FABRICS=tmi
export PSM2_IDENTIFY=1
export PSM2_RCVTHREAD=0
export TMI_PSM2_TEST_POLL=1

NTASKS_PER_NODE=1
export MPPEXEC="mpiexec.hydra -perhost $NTASKS_PER_NODE -f ./hosts "
numactl -m 1 $MPPEXEC $BINPATH/$BINNAME +ppn 135 $FULLPATH/$WORKLOAD +pemap 1-135 +commap 0

Example:
numactl -m 1 $MPPEXEC $FULLPATH/namd2 +ppn 135 $FULLPATH/stmv/stmv.namd +pemap 1-135 +commap 0

KNL Run on multi-node (node = number of nodes to run on):
```
export MPPEXEC="mpiexec.hydra -perhost 1 -f ./hosts "
numactl -m 1 $MPPEXEC -n $node numactl -m 1 $BINPATH/$BINNAME +ppn 134 $FULLPATH/$WORKLOAD +pemap 0-($ppn-1) +commap 67 
```
Example:
numactl -m 1 $MPPEXEC -n 8 numactl -m 1 $FULLPATH/namd2 +ppn 134 $FULLPATH/stmv/stmv.nand +pemap 0-66+68 +commap 67

Remark:

For better scale on multinodes run, please increase count of communication threads (1, 2, 4, 8, 13, 17). Example of a command run that can be used:

export MPPEXEC="mpiexec.hydra -perhost 17 -f ./hosts "
numactl -m 1 $MPPEXEC -n $(($node*17)) numactl -m 1 $BINPATH/$BINNAME +ppn 7  $FULLPATH/$WORKLOAD +pemap 0-67,68-135:4.3 +commap 71-135:4 > ${WKL}_cluster_commapN/${WKL}.$node.$

One usage example:

nodes="16 8 4 2 1"
for node in ${nodes}
do
  	export MPPEXEC="mpiexec.hydra -perhost 17 -f ./hosts
numactl -m 1 $MPPEXEC -n $(($node*17)) numactl -m 1 $FullPath.namd2  +ppn 8  $WorkloadPath/$WKL/$WKL.namd  +pemap 0-67+68 +commap 71-135:4 > $ResultFile.$node.$BINNAME.68c2t.commap_8th_from2cx4t
done

Best performance results reported on up to 128 Intel Xeon Phi nodes cluster (ns/day; higher is better):

Workload\node (2HT)	1	2	4	8	16
stmv (ns/day)	0.55	1.05	1.86	3.31	5.31

Workload\node (2HT)	8	16	32	64	128
stmv.28M (ns/day)	0.152	0.310	0.596	1.03	1.91

Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors

Purpose

Introduction

Building and running NAMD on Intel® Xeon® Processor E5-2697 v4 (BDW) and Intel® Xeon Phi™ Processor 7250 (KNL)

Building and running NAMD for Cluster on Intel® Xeon® Processor E5-2697 v4 (BDW) and Intel® Xeon Phi™ Processor 7250 (KNL)

Trending Articles

Sarah Samis, Emil Bove III

Black Angus Grilled Artichokes

Arrest logs for Wednesday, March 20, 2019

Blackstone — Befi Mano (Throw Back Thursday)

ZARIA CUMMINGS

Muloraki Au

Step by step MIM PAM setup and evaluation Guide – Part 2

Nicki Minaj – The Pinkprint (Tenth Anniversary Edition) [iTunes Plus M4A]

Burnet County Jail bookings for the period of Nov. 27-Dec. 5

【急ぎ】LabVIEWのインストール中のエラー

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Bureau of Internal Revenue: Regional Offices (Directory)

Re: XXX esx.problem.hyperthreading.unmitigated.formatOnHost not found XXX...

Detroit Drug World Murder Timeline: YBI, Best Friends, Maserati & Original...

Transformation of Sentence for HSC Students

99 Rain Status for Whatsapp - Best Rain Dp Collection

Lowe faces four theft charges

'Kamikaze' killer driver Ian Milligan loses appeal

Custom TAB in ML81N (Entry Sheet) Header

NARSAPUR Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...