메일링 리스트 picongpu-users@hzdr.de 메시지 #10
보낸 사람: Anshuman Goswami <anshumang@gatech.edu>
발신자: <goswami.anshuman@gmail.com>
제목: Adios performance data
일자: Wed, 6 May 2015 10:35:00 -0400
받는 사람: <picongpu-users@hzdr.de>
Hi Folks,

I ran some measurements on the ADIOSWriter plugin and wanted to check if there are some reference numbers to validate against. I could only run it on a M2090 so numbers might not agree but still wanted to get a ballpark comparison.

Experiment description:
* -g 128 128 128
* -d 1 1 1
* Single node

Performance data:
* Avg simulation timestep : 2.1sec
* ADIOSWriter : 338sec
    * Field : 1.8sec
    * Species1 : 165sec
        * kernel 'copySpecies' : 163.6sec
    * Species2 : 165.1sec
        * kernel 'copySpecies' : 163.6sec

Questions:
* Why is the destination buffer (deviceFrame) of 'copySpecies' alloc'd on host pinned memory and not on device memory?
* Does the 163sec of execution time of 'copySpecies' for the chosen simulation size look reasonable even for an M2090?
*  If the source buffer (speciesTmp->getDeviceParticlesBox()) is copied to host memory and a CPU version of 'copySpecies' is run instead, would it be same semantically?
-----> To do the above, I measured the following - 
           speciesTmp->synchronize();
           cudaDeviceSynchronize();
        It comes to -
             Species1 : 369 ms
             Species2 : 443 ms

Thanks,
Anshuman
가입 (FEED) 가입 (DIGEST) 가입 (INDEX) 탈퇴 Listmaster에게 메일 보내기