Liste de diffusion picongpu-users@hzdr.de Message #209
De: Khikhlukha Danila <Danila.Khikhlukha@eli-beams.eu>
Sujet: Restart failure
Date: Mon, 13 Feb 2017 14:11:10 +0000
A: picongpu-users@hzdr.de <picongpu-users@hzdr.de>
Dear all,
currently I was trying to setup PoG in the Jureca machine. It all worked fine for the LWFA example, however when I tried to restart the simulation I received a segfault almost immediately. 
My tool chain is as follows

GCC/5.4.0
CUDA/8.0.44
MVAPICH2/2.2-GDR
HDF5/1.8.17
Boost/1.61.0

So, the first run didn't have any problems -- pictures, save points and data dumps were created. When I tried to launch the restart it crashes although I explicitly specify the savepoint directory.

test$ diff -r 0002/submit/ 0002_restart/submit/
diff -r 0002/submit/0008gpus.cfg 0002_restart/submit/0008gpus.cfg
39c39
< TBG_steps="-s 1024"
---
> TBG_steps="-s 2048"
41a42
> TBG_restart="--restart --restart-directory /work/hhh20/hhh20z/run_0002/simOutput/checkpoints"
67a69
>                    !TBG_restart      \

I also checked that it exists and accessible. I tried to switch on some debug information, with the following command:

$PICSRC/configure -c"-DCMAKE_VERBOSE_MAKEFILE=ON -DPIC_VERBOSE_LVL=29 -DPMACC_VERBOSE_LVL=7"

however I didn't find any information except a standard message:
[jrc0007:mpi_rank_4][error_sighandler] Caught error: Segmentation fault (signal 11)

Could you please advice me if there are another way how to diagnose the problem (except launching a gdb). may be I'm doing something wrong? However restart used to work on other machines...


Thank you in advance,
Danila.

S'abonner aux messages S'abonner aux sommaires S'abonner aux indexes Se désabonner Ecrire un email au responsable de la liste