Return-Path: Received: from [178.24.5.94] (account huebl@hzdr.de HELO [192.168.178.22]) by hzdr.de (CommuniGate Pro SMTP 6.1.12) with ESMTPSA id 15679181 for picongpu-users@hzdr.de; Tue, 14 Feb 2017 23:02:21 +0100 Subject: Re: Restart failure To: picongpu-users@hzdr.de References: From: Axel Huebl X-Enigmail-Draft-Status: N1110 Organization: HZDR Message-ID: <46269833-df59-b4ee-1e09-5652574d4b9b@hzdr.de> Date: Tue, 14 Feb 2017 23:02:20 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Hi Danila, so far that looks normal, can you try specifying your restart step explicitly with e.g. --restart-step 1024 ? Which exact version of PIConGPU did you use? We are currently not aware of known problems with restarts. If we can narrow down the source of your segfault this would be wonderful (although I suspect it might be in a third party lib), so in case that you are able to do a small 1 node example and can hang in gdb there https://github.com/ComputationalRadiationPhysics/picongpu/wiki/Debugging we might get a better understanding. That said, you can also add ADIOS to your compile chain which will automatically use ADIOS .bp files for checkpointing & restarting and you can still use HDF5 for regular output (or use ADIOS for both) in case HDF5 behaves too nasty and you need to move fast. Axel On 13.02.2017 15:47, Khikhlukha Danila wrote: > Hi René, > sure, pls. see the attachment. Please let me know if more information is needed. > > D. > ________________________________________ > From: picongpu-users@hzdr.de [picongpu-users@hzdr.de] on behalf of René Widera [r.widera@hzdr.de] > Sent: Monday, February 13, 2017 3:39 PM > To: picongpu-users@hzdr.de > Subject: Re: [PIConGPU-Users] [PIConGPU-Users] Restart failure > > Dear Danila, > > could you please send use the `stdout`, `stderr` and the files from the > `tbg` folder? > > best, > > René > > On 02/13/2017 03:11 PM, Khikhlukha Danila wrote: >> Dear all, >> currently I was trying to setup PoG in the Jureca machine. It all worked >> fine for the LWFA example, however when I tried to restart the >> simulation I received a segfault almost immediately. >> My tool chain is as follows >> >> GCC/5.4.0 >> CUDA/8.0.44 >> MVAPICH2/2.2-GDR >> HDF5/1.8.17 >> Boost/1.61.0 >> >> So, the first run didn't have any problems -- pictures, save points and >> data dumps were created. When I tried to launch the restart it crashes >> although I explicitly specify the savepoint directory. >> >> test$ diff -r 0002/submit/ 0002_restart/submit/ >> diff -r 0002/submit/0008gpus.cfg 0002_restart/submit/0008gpus.cfg >> 39c39 >> < TBG_steps="-s 1024" >> --- >>> TBG_steps="-s 2048" >> 41a42 >>> TBG_restart="--restart --restart-directory >> /work/hhh20/hhh20z/run_0002/simOutput/checkpoints" >> 67a69 >>> !TBG_restart \ >> >> I also checked that it exists and accessible. I tried to switch on some >> debug information, with the following command: >> >> $PICSRC/configure -c"-DCMAKE_VERBOSE_MAKEFILE=ON -DPIC_VERBOSE_LVL=29 >> -DPMACC_VERBOSE_LVL=7" >> >> however I didn't find any information except a standard message: >> [jrc0007:mpi_rank_4][error_sighandler] Caught error: Segmentation fault >> (signal 11) >> >> Could you please advice me if there are another way how to diagnose the >> problem (except launching a gdb). may be I'm doing something wrong? >> However restart used to work on other machines... >> >> >> Thank you in advance, >> Danila. >> > > -- > René Widera > Abteilung Laser-Teilchenbeschleunigung (FWKT) > Helmholtz-Zentrum Dresden-Rossendorf > Tel: +49 (0351) 260 3543 > r.widera@hzdr.de > http://www.hzdr.de > > Vorstand: Prof. Dr. Dr. h. c. Roland Sauerbrey, > Prof. Dr. Dr. h. c. Peter Joehnk > Vereinsregister: VR 1693 beim Amtsgericht Dresden > > ############################################################# > This message is sent to you because you are subscribed to > the mailing list . > To unsubscribe, E-mail to: > To switch to the DIGEST mode, E-mail to > To switch to the INDEX mode, E-mail to > Send administrative queries to > > > > ############################################################# > This message is sent to you because you are subscribed to > the mailing list . > To unsubscribe, E-mail to: > To switch to the DIGEST mode, E-mail to > To switch to the INDEX mode, E-mail to > Send administrative queries to > -- Axel Huebl Phone +49 351 260 3582 https://www.hzdr.de/crp Computational Radiation Physics Laser Particle Acceleration Division Helmholtz-Zentrum Dresden - Rossendorf e.V. Bautzner Landstrasse 400, 01328 Dresden POB 510119, D-01314 Dresden Vorstand: Prof. Dr.Dr.h.c. R. Sauerbrey Prof. Dr.Dr.h.c. P. Joehnk VR 1693 beim Amtsgericht Dresden