Mailing List picongpu-users@hzdr.de Message #10
From: Anshuman Goswami <anshumang@gatech.edu>
Sender: <goswami.anshuman@gmail.com>
Subject: Adios performance data
Date: Wed, 6 May 2015 10:35:00 -0400
To: <picongpu-users@hzdr.de>
Hi Folks,

I ran some measurements on the ADIOSWriter plugin and wanted to check if there are some reference numbers to validate against. I could only run it on a M2090 so numbers might not agree but still wanted to get a ballpark comparison.

Experiment description:
* -g 128 128 128
* -d 1 1 1
* Single node

Performance data:
* Avg simulation timestep : 2.1sec
* ADIOSWriter : 338sec
    * Field : 1.8sec
    * Species1 : 165sec
        * kernel 'copySpecies' : 163.6sec
    * Species2 : 165.1sec
        * kernel 'copySpecies' : 163.6sec

Questions:
* Why is the destination buffer (deviceFrame) of 'copySpecies' alloc'd on host pinned memory and not on device memory?
* Does the 163sec of execution time of 'copySpecies' for the chosen simulation size look reasonable even for an M2090?
*  If the source buffer (speciesTmp->getDeviceParticlesBox()) is copied to host memory and a CPU version of 'copySpecies' is run instead, would it be same semantically?
-----> To do the above, I measured the following - 
           speciesTmp->synchronize();
           cudaDeviceSynchronize();
        It comes to -
             Species1 : 369 ms
             Species2 : 443 ms

Thanks,
Anshuman
Subscribe (FEED) Subscribe (DIGEST) Subscribe (INDEX) Unsubscribe Mail to Listmaster