Mesazhi #209 i Listës së E-mailave picongpu-users@hzdr.de
Nga: Khikhlukha Danila <Danila.Khikhlukha@eli-beams.eu>
Lënda: Restart failure
Data: Mon, 13 Feb 2017 14:11:10 +0000
Për: picongpu-users@hzdr.de <picongpu-users@hzdr.de>
Dear all,
currently I was trying to setup PoG in the Jureca machine. It all worked fine for the LWFA example, however when I tried to restart the simulation I received a segfault almost immediately. 
My tool chain is as follows

GCC/5.4.0
CUDA/8.0.44
MVAPICH2/2.2-GDR
HDF5/1.8.17
Boost/1.61.0

So, the first run didn't have any problems -- pictures, save points and data dumps were created. When I tried to launch the restart it crashes although I explicitly specify the savepoint directory.

test$ diff -r 0002/submit/ 0002_restart/submit/
diff -r 0002/submit/0008gpus.cfg 0002_restart/submit/0008gpus.cfg
39c39
< TBG_steps="-s 1024"
---
> TBG_steps="-s 2048"
41a42
> TBG_restart="--restart --restart-directory /work/hhh20/hhh20z/run_0002/simOutput/checkpoints"
67a69
>                    !TBG_restart      \

I also checked that it exists and accessible. I tried to switch on some debug information, with the following command:

$PICSRC/configure -c"-DCMAKE_VERBOSE_MAKEFILE=ON -DPIC_VERBOSE_LVL=29 -DPMACC_VERBOSE_LVL=7"

however I didn't find any information except a standard message:
[jrc0007:mpi_rank_4][error_sighandler] Caught error: Segmentation fault (signal 11)

Could you please advice me if there are another way how to diagnose the problem (except launching a gdb). may be I'm doing something wrong? However restart used to work on other machines...


Thank you in advance,
Danila.

Regjistrohu (për LAJME Automatike) Regjistrohu (për KLASIFIKIME) Pajtohu (për INDEKSIME) Ç'regjistrohu Shkruaji Administratorit të Listës