Return-Path: Received: from mx2.fz-rossendorf.de ([149.220.142.12] verified) by hzdr.de (CommuniGate Pro SMTP 6.1.12) with ESMTP id 15669726 for picongpu-users@cg.hzdr.de; Mon, 13 Feb 2017 15:11:24 +0100 Received: from localhost (localhost [127.0.0.1]) by mx2.fz-rossendorf.de (Postfix) with ESMTP id D5ED042EBD for ; Mon, 13 Feb 2017 15:11:24 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mx2.fz-rossendorf.de Received: from mx2.fz-rossendorf.de ([127.0.0.1]) by localhost (mx2.fz-rossendorf.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iN3boIkjK2Co for ; Mon, 13 Feb 2017 15:11:20 +0100 (CET) Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=147.231.234.10; helo=mailgw.eli-beams.eu; envelope-from=prvs=1217dec4b3=danila.khikhlukha@eli-beams.eu; receiver=picongpu-users@hzdr.de Received: from mailgw.eli-beams.eu (mailgw.eli-beams.eu [147.231.234.10]) by mx2.fz-rossendorf.de (Postfix) with ESMTPS id 0150142E98 for ; Mon, 13 Feb 2017 15:11:19 +0100 (CET) Received: from mail.eli-beams.eu ([10.1.5.17]) by mailgw.eli-beams.eu with ESMTP id v1DEBBfR005171-v1DEBBfT005171 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL) for ; Mon, 13 Feb 2017 15:11:11 +0100 Received: from BRAUN.eli-beams.eu ([::1]) by braun.eli-beams.eu ([::1]) with mapi id 14.03.0319.002; Mon, 13 Feb 2017 15:11:11 +0100 From: Khikhlukha Danila To: "picongpu-users@hzdr.de" Subject: Restart failure Thread-Topic: Restart failure Thread-Index: AdKF/0u45xk27j6gTXCr1/zZcutJTw== Date: Mon, 13 Feb 2017 14:11:10 +0000 Message-ID: Accept-Language: en-US, cs-CZ Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.36.30.5] Content-Type: multipart/alternative; boundary="_000_BA7C853FEE430847B9C35FFCC6E5B2A52A32CBB5braunelibeamseu_" MIME-Version: 1.0 --_000_BA7C853FEE430847B9C35FFCC6E5B2A52A32CBB5braunelibeamseu_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear all, currently I was trying to setup PoG in the Jureca machine. It all worked fi= ne for the LWFA example, however when I tried to restart the simulation I r= eceived a segfault almost immediately. My tool chain is as follows GCC/5.4.0 CUDA/8.0.44 MVAPICH2/2.2-GDR HDF5/1.8.17 Boost/1.61.0 So, the first run didn't have any problems -- pictures, save points and dat= a dumps were created. When I tried to launch the restart it crashes althoug= h I explicitly specify the savepoint directory. test$ diff -r 0002/submit/ 0002_restart/submit/ diff -r 0002/submit/0008gpus.cfg 0002_restart/submit/0008gpus.cfg 39c39 < TBG_steps=3D"-s 1024" --- > TBG_steps=3D"-s 2048" 41a42 > TBG_restart=3D"--restart --restart-directory /work/hhh20/hhh20z/run_0002/= simOutput/checkpoints" 67a69 > !TBG_restart \ I also checked that it exists and accessible. I tried to switch on some deb= ug information, with the following command: $PICSRC/configure -c"-DCMAKE_VERBOSE_MAKEFILE=3DON -DPIC_VERBOSE_LVL=3D29 -= DPMACC_VERBOSE_LVL=3D7" however I didn't find any information except a standard message: [jrc0007:mpi_rank_4][error_sighandler] Caught error: Segmentation fault (si= gnal 11) Could you please advice me if there are another way how to diagnose the pro= blem (except launching a gdb). may be I'm doing something wrong? However re= start used to work on other machines... Thank you in advance, Danila. --_000_BA7C853FEE430847B9C35FFCC6E5B2A52A32CBB5braunelibeamseu_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Dear all,
currently I was trying to setup PoG in the Jureca machine. It all worked fi= ne for the LWFA example, however when I tried to restart the simulation I r= eceived a segfault almost immediately. 
My tool chain is as follows

GCC/5.4.0
CUDA/8.0.44
MVAPICH2/2.2-GDR
HDF5/1.8.17
Boost/1.61.0

So, the first run didn't have any problems -- pictures, save points and dat= a dumps were created. When I tried to launch the restart it crashes althoug= h I explicitly specify the savepoint directory.

test$ diff -r 0002/submit/ 0002_restart/submit/
diff -r 0002/submit/0008gpus.cfg 0002_restart/submit/0008gpus.cfg
39c39
< TBG_steps=3D"-s 1024"
---
> TBG_steps=3D"-s 2048"
41a42
> TBG_restart=3D"--restart --restart-directory /work/hhh20/hhh20z/r= un_0002/simOutput/checkpoints"
67a69
>            = ;        !TBG_restart   &= nbsp;  \

I also checked that it exists and accessible. I tried to switch on some deb= ug information, with the following command:

$PICSRC/configure -c"-DCMAKE_VERBOSE_MAKEFILE=3DON -DPIC_VERBOSE_LVL= =3D29 -DPMACC_VERBOSE_LVL=3D7"

however I didn't find any information except a standard message:
[jrc0007:mpi_rank_4][error_sighandler] Caught error: Segmentation fault (si= gnal 11)

Could you please advice me if there are another way how to diagnose the pro= blem (except launching a gdb). may be I'm doing something wrong? However re= start used to work on other machines...


Thank you in advance,
Danila.

--_000_BA7C853FEE430847B9C35FFCC6E5B2A52A32CBB5braunelibeamseu_--