From: "Axel Huebl a.huebl@hzdr.de" Received: from [95.90.139.83] (account huebl@hzdr.de HELO [192.168.178.35]) by hzdr.de (CommuniGate Pro SMTP 6.2.7i) with ESMTPSA id 19831631 for picongpu-users@hzdr.de; Mon, 21 Jan 2019 12:26:38 +0100 To: picongpu-users@hzdr.de Subject: OpenMPI: Use ROMIO for IO Organization: HZDR Message-ID: <9e19ca28-570d-a280-316e-32cbe7aad444@hzdr.de> Date: Mon, 21 Jan 2019 12:26:38 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Dear Users, we are relying on several dependencies of our community to develop PIConGPU. As you might know, one of those dependencies is "MPI" which is used for multi-node message passing in HPC and implemented independently in projects such as MPICH or OpenMPI. MPI is also used for I/O operations and as such used as a dependency of HDF5 and ADIOS which we use in plugins for parallel I/O. Unfortunately, (all) recent releases of the OpenMPI implementation have an issue that you might want to mitigate. OpenMPI's default for its IO backend is OMPIO, starting with OpenMPI 2.x. Unfortunately, that backend contains severe bugs leading to data corruption and sporadic crashes (as of the latest releases 3.1.3 and 4.0.0). This is most visible with our parallel HDF5 plugin, but ADIOS is potentially affected as well. Please see https://github.com/open-mpi/ompi/issues/6285 for details. For all system templates (`.tpl` files for `tbg`) that rely on OpenMPI (and its derivatives, such as BullMPI), we now disable the "OMPIO" default IO backend and fallback to the existing ROMIO backend for MPI-I/O until bugfix releases are shipped. https://github.com/ComputationalRadiationPhysics/picongpu/pull/2857 Please apply those changes manually already in your `etc/picongpu//.tpl`. We recommend to mitigate this issue already since the data corruption this causes might go unnoticed even if you don't see crashes. Other MPI implementations such as MPICH, and [MPICH-based flavors](https://www.mpich.org/about/collaborators/) such as IntelMPI, use ROMIO by default (they develop ROMIO) and are not affected. Best regards, Axel -- Axel Huebl Phone: +49 351 260 3582 Institute of Radiation Physics http://www.hzdr.de/crp Helmholtz-Zentrum Dresden - Rossendorf (HZDR) Bautzner Landstr. 400 | 01328 Dresden | Germany Board of Directors: Prof. Dr. Dr. h. c. Roland Sauerbrey, Dr. Ulrich Breuer Company Registration Number VR 1693, Amtsgericht Dresden