Return-Path: Received: from [149.220.60.34] (account widera@hzdr.de [149.220.60.34] verified) by hzdr.de (CommuniGate Pro SMTP 6.1.12) with ESMTPSA id 16381982 for picongpu-users@hzdr.de; Mon, 13 Feb 2017 15:39:31 +0100 Subject: Re: [PIConGPU-Users] Restart failure To: picongpu-users@hzdr.de References: From: =?UTF-8?Q?Ren=c3=a9_Widera?= Message-ID: <8d42af5f-622d-d216-8703-45536ae6da3d@hzdr.de> Date: Mon, 13 Feb 2017 15:39:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Dear Danila, could you please send use the `stdout`, `stderr` and the files from the `tbg` folder? best, René On 02/13/2017 03:11 PM, Khikhlukha Danila wrote: > Dear all, > currently I was trying to setup PoG in the Jureca machine. It all worked > fine for the LWFA example, however when I tried to restart the > simulation I received a segfault almost immediately. > My tool chain is as follows > > GCC/5.4.0 > CUDA/8.0.44 > MVAPICH2/2.2-GDR > HDF5/1.8.17 > Boost/1.61.0 > > So, the first run didn't have any problems -- pictures, save points and > data dumps were created. When I tried to launch the restart it crashes > although I explicitly specify the savepoint directory. > > test$ diff -r 0002/submit/ 0002_restart/submit/ > diff -r 0002/submit/0008gpus.cfg 0002_restart/submit/0008gpus.cfg > 39c39 > < TBG_steps="-s 1024" > --- >> TBG_steps="-s 2048" > 41a42 >> TBG_restart="--restart --restart-directory > /work/hhh20/hhh20z/run_0002/simOutput/checkpoints" > 67a69 >> !TBG_restart \ > > I also checked that it exists and accessible. I tried to switch on some > debug information, with the following command: > > $PICSRC/configure -c"-DCMAKE_VERBOSE_MAKEFILE=ON -DPIC_VERBOSE_LVL=29 > -DPMACC_VERBOSE_LVL=7" > > however I didn't find any information except a standard message: > [jrc0007:mpi_rank_4][error_sighandler] Caught error: Segmentation fault > (signal 11) > > Could you please advice me if there are another way how to diagnose the > problem (except launching a gdb). may be I'm doing something wrong? > However restart used to work on other machines... > > > Thank you in advance, > Danila. > -- René Widera Abteilung Laser-Teilchenbeschleunigung (FWKT) Helmholtz-Zentrum Dresden-Rossendorf Tel: +49 (0351) 260 3543 r.widera@hzdr.de http://www.hzdr.de Vorstand: Prof. Dr. Dr. h. c. Roland Sauerbrey, Prof. Dr. Dr. h. c. Peter Joehnk Vereinsregister: VR 1693 beim Amtsgericht Dresden