From: "Axel Huebl a.huebl@hzdr.de" Received: from [149.220.60.110] (account huebl@hzdr.de [149.220.60.110] verified) by hzdr.de (CommuniGate Pro SMTP 6.2.4) with ESMTPSA id 19551811 for picongpu-users@hzdr.de; Mon, 14 May 2018 14:04:37 +0200 Subject: Re: DGX-1 To: picongpu-users@hzdr.de References: Organization: HZDR Message-ID: <4b9844da-5017-c602-7a95-c5cd79c601e1@hzdr.de> Date: Mon, 14 May 2018 14:04:37 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Just to add to the arguments, the additional (non-tensor core) Flop/s in V100 vs. the 2 year-old P100 satisfy the additional costs, that's why V100 are ok. Also, the V100 32GB are too my knowledge available for the same price as the "first" V100 (16 GB). So ideally, use the 32GB variants for larger problem sizes! Regarding NVlink & NVswitch in newer stations: we can enable RDMA over those via GPU direct in PIConGPU, although it's not yet mainline. In most settings outside of heavy strong-scaling tough, we are hiding latency well enough so that a smaller BW and longer latency won't slow down your simulation. (Read: the intra- and interconnet is not too important for PIConGPU since we assume the worst.) Anyway, if you plan things like in-node global FFTs, e.g. as an in situ plugin to get the envelope of a laser via a Hankel transform, NVlink and NVswitch will pay off. Cheers, Axel On 5/14/18 1:18 PM, René Widera r.widera@hzdr.de wrote: > Dear Andrei, > >> My question is, would PIConGPU run on the DGX-1 and can it make use of > NVLink [2] v2.0? > > Currently we are not using NVLink. But it is planed to add support for > MPI GPU-Direct which should than use NVLink. > >> Also, I'm guessing it can't use the tensor cores in the V100 version > of DGX-1? > > Currently we are not using tensor cores. It is not fully clear if tensor > cores will give an advantage. ONe drawback of the tensor cores is that > the using fp16. In PIConGPU we use at least fp32 if you not activate > fp64 support. > > Never the less I think a DGX-1 with V100 is the right system. > > René (psychocoderHPC) > > On 05/14/2018 12:36 PM, Andrei Berceanu berceanu@runbox.com wrote: >> Hi, >> >> First of all, let me provide some context: we are considering >> purchasing a DGX-1 system [1] from Nvidia for PIConGPU and are trying >> to decide between the P100 and V100 versions. >> >> My question is, would PIConGPU run on the DGX-1 and can it make use of >> NVLink [2] v2.0? >> >> Also, I'm guessing it can't use the tensor cores in the V100 version >> of DGX-1? >> >> Regards, >> Andrei >> >> [1] https://en.wikipedia.org/wiki/Nvidia_DGX-1 >> [2] https://en.wikipedia.org/wiki/NVLink >> ############################################################# >> This message is sent to you because you are subscribed to >>    the mailing list . >> To unsubscribe, E-mail to: >> To switch to the DIGEST mode, E-mail to >> To switch to the INDEX mode, E-mail to >> Send administrative queries to  >> > -- Axel Huebl Phone: +49 351 260 3582 Institute of Radiation Physics http://www.hzdr.de/crp Helmholtz-Zentrum Dresden - Rossendorf (HZDR) Bautzner Landstr. 400 | 01328 Dresden | Germany Board of Directors: Prof. Dr. Dr. h. c. Roland Sauerbrey, Dr. Ulrich Breuer Company Registration Number VR 1693, Amtsgericht Dresden