Mailing List picongpu-users@hzdr.de Message #209
From: Khikhlukha Danila <Danila.Khikhlukha@eli-beams.eu>
Subject: Restart failure
Date: Mon, 13 Feb 2017 14:11:10 +0000
To: picongpu-users@hzdr.de <picongpu-users@hzdr.de>
Dear all,
currently I was trying to setup PoG in the Jureca machine. It all worked fine for the LWFA example, however when I tried to restart the simulation I received a segfault almost immediately. 
My tool chain is as follows

GCC/5.4.0
CUDA/8.0.44
MVAPICH2/2.2-GDR
HDF5/1.8.17
Boost/1.61.0

So, the first run didn't have any problems -- pictures, save points and data dumps were created. When I tried to launch the restart it crashes although I explicitly specify the savepoint directory.

test$ diff -r 0002/submit/ 0002_restart/submit/
diff -r 0002/submit/0008gpus.cfg 0002_restart/submit/0008gpus.cfg
39c39
< TBG_steps="-s 1024"
---
> TBG_steps="-s 2048"
41a42
> TBG_restart="--restart --restart-directory /work/hhh20/hhh20z/run_0002/simOutput/checkpoints"
67a69
>                    !TBG_restart      \

I also checked that it exists and accessible. I tried to switch on some debug information, with the following command:

$PICSRC/configure -c"-DCMAKE_VERBOSE_MAKEFILE=ON -DPIC_VERBOSE_LVL=29 -DPMACC_VERBOSE_LVL=7"

however I didn't find any information except a standard message:
[jrc0007:mpi_rank_4][error_sighandler] Caught error: Segmentation fault (signal 11)

Could you please advice me if there are another way how to diagnose the problem (except launching a gdb). may be I'm doing something wrong? However restart used to work on other machines...


Thank you in advance,
Danila.

Subscribe (FEED) Subscribe (DIGEST) Subscribe (INDEX) Unsubscribe Mail to Listmaster