Return-Path: Received: from [149.220.158.137] (account huebl@hzdr.de [149.220.158.137] verified) by hzdr.de (CommuniGate Pro SMTP 6.0.10) with ESMTPSA id 11318289 for picongpu-users@hzdr.de; Fri, 08 May 2015 09:20:50 +0200 Message-ID: <554C63D2.1020006@hzdr.de> Date: Fri, 08 May 2015 09:20:50 +0200 From: "Huebl, Axel" MIME-Version: 1.0 To: picongpu-users@hzdr.de Subject: Re: Adios performance data References: In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms020002000808020109010002" This is a cryptographically signed message in MIME format. --------------ms020002000808020109010002 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Due to missing headers and reply problems on the list (reported to listmaster, will be fixed soon), we will continue this thread on GitHub: https://github.com/ComputationalRadiationPhysics/picongpu/issues/858 Axel On 07.05.2015 17:27, Huebl, Axel wrote: > Hi Anshuman, >=20 >=20 > the "copySpecies" kernel basically *is* a deep-copy. >=20 >=20 > Best, > Axel >=20 > On 07.05.2015 17:20, Anshuman Goswami wrote: >> Thanks a ton :+1: >> Getting back with the environment details shortly.... >> >> Key points, I think are (pls correct me where I am wrong) - >> >> * Need to rerun on the K40 >> * Perform a deep copy of the species data otherwise the copy doesn't= >> make sense >=20 > On 07.05.2015 16:46, Huebl, Axel wrote: >> Hi Anshuman, >> >> >> some general questions we need with each request: >> - what version of PIConGPU are you running >> (beta-rc6, release-0.1.0, latest dev?) >> - what system/compilers/third-party libraries are you using? >> >> Now let's start with your questions as far as we get. >> >>> * Why is the destination buffer (deviceFrame) of 'copySpecies' alloc'= d >>> on host pinned memory and not on device memory? >> >> In the current development (dev) version of PIConGPU, we do not use >> double-buffering for particles any more. For that, we nowadays use >> *mapped* memory for access from the host side, e.g., for dumps. >> >> Before that, we used *pinned* memory on the host side for asynchronous= >> copies that were used as double-buffers for the previously mentioned >> operations. >> >>> * If the source buffer (speciesTmp->getDeviceParticlesBox()) is >>> copied to host memory and a CPU version of 'copySpecies' is run >>> instead, would it be same semantically? >> >> The speciesTmp->getDeviceParticlesBox() is kind-of an iterator pointin= g >> to non-contigous memory on the device (it contains for each super-cell= >> the dbl-linked list of frames of particles). >> >> Due to the nested structure of device pointers we use in that object, >> simple copies to host scope are not possible without rendering the >> pointers to an undefined state (since a pointer values on the device a= re >> not the same in the address range of the host RAM). >> >>> * Does the 163sec of execution time of 'copySpecies' for the chosen >>> simulation size look reasonable even for an M2090? >> >> That's pretty hard to judge from the provided information. >> >> How many time steps did you run? >> How often did you dump (adios.period)? >> How many particles did you use in total? >> >> Also, the M2090 ("Fermi") generation is "pretty old" meaning it might = be >> possible that the performance of the Fermi implementation for mapped >> memory is not comparable any more to modern hardware ("Kepler" is out >> since Nov/2012). Please use modern hardware for benchmarks, we won't >> optimize for that generation (even if we still support general operati= on >> on it). >> >> We did not benchmark `copySpecies` in detail yet, but since we only us= e >> it on Kepler the "163.6sec" sound a bit unrealistic (or way off). Can >> you compare it to your Kepler cards in your node, pls? >> >>> -----> To do the above, I measured the following - >>> speciesTmp->synchronize(); >>> cudaDeviceSynchronize(); >>> It comes to - >>> Species1 : 369 ms >>> Species2 : 443 ms >> >> The >> speciesTmp->synchronize(); >> is without implementation - it would usually synchronize data from >> device to host but is not implemented for particles (due to their nest= ed >> memory structure, see above). >> >> So basically what you measured is the time of a >> cudaDeviceSynchronize(); >> and *all the kernels that were still running* before that, which can b= e >> anything. >> >> Also: before you measure kernels, always run a >> cudaDeviceSynchronize(); >> else you get the unpredictable load of asynchronous kernels that you >> don't intended to measure ;) >> >> Nevertheless, we measured the overall speedup comparing HDF5 and ADIOS= , >> leading to significant speedups when running on several hundred >> GPUs/nodes with dumping around 6GB from each (per adios.period). >> >> >> Best regards, >> Axel & Ren=C3=A9 >> >> On 06.05.2015 16:35, Anshuman Goswami wrote: >>> Hi Folks, >>> >>> I ran some measurements on the ADIOSWriter plugin and wanted to check= if >>> there are some reference numbers to validate against. I could only ru= n >>> it on a M2090 so numbers might not agree but still wanted to get a >>> ballpark comparison. >>> >>> Experiment description: >>> * -g 128 128 128 >>> * -d 1 1 1 >>> * Single node >>> >>> Performance data: >>> * Avg simulation timestep : 2.1sec >>> * ADIOSWriter : 338sec >>> * Field : 1.8sec >>> * Species1 : 165sec >>> * kernel 'copySpecies' : 163.6sec >>> * Species2 : 165.1sec >>> * kernel 'copySpecies' : 163.6sec >>> >>> Questions: >>> * Why is the destination buffer (deviceFrame) of 'copySpecies' alloc'= d >>> on host pinned memory and not on device memory? >>> * Does the 163sec of execution time of 'copySpecies' for the chosen >>> simulation size look reasonable even for an M2090? >>> * If the source buffer (speciesTmp->getDeviceParticlesBox()) is copi= ed >>> to host memory and a CPU version of 'copySpecies' is run instead, wou= ld >>> it be same semantically? >>> -----> To do the above, I measured the following -=20 >>> speciesTmp->synchronize(); >>> cudaDeviceSynchronize(); >>> It comes to - >>> Species1 : 369 ms >>> Species2 : 443 ms >>> >>> Thanks, >>> Anshuman >> >=20 --=20 Axel Huebl Phone +49 351 260 3582 https://www.hzdr.de/crp Computational Radiation Physics Laser Particle Acceleration Division Helmholtz-Zentrum Dresden - Rossendorf e.V. Bautzner Landstrasse 400, 01328 Dresden POB 510119, D-01314 Dresden Vorstand: Prof. Dr.Dr.h.c. R. Sauerbrey Prof. Dr.Dr.h.c. P. Joehnk VR 1693 beim Amtsgericht Dresden --------------ms020002000808020109010002 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIPrDCC BNUwggO9oAMCAQICCFBOxvU9EbRkMA0GCSqGSIb3DQEBCwUAMHExCzAJBgNVBAYTAkRFMRww GgYDVQQKExNEZXV0c2NoZSBUZWxla29tIEFHMR8wHQYDVQQLExZULVRlbGVTZWMgVHJ1c3Qg Q2VudGVyMSMwIQYDVQQDExpEZXV0c2NoZSBUZWxla29tIFJvb3QgQ0EgMjAeFw0xNDA3MjIx MjA4MjZaFw0xOTA3MDkyMzU5MDBaMFoxCzAJBgNVBAYTAkRFMRMwEQYDVQQKEwpERk4tVmVy ZWluMRAwDgYDVQQLEwdERk4tUEtJMSQwIgYDVQQDExtERk4tVmVyZWluIFBDQSBHbG9iYWwg LSBHMDEwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDpm8NnhfkNrvWNVMOWUDU9 YuluTO2U1wBblSJ01CDrNI/W7MAxBAuZgeKmFNJSoCgjhIt0iQReW+DieMF4yxbLKDU5ey2Q RdDtoAB6fL9KDhsAw4bpXCsxEXsM84IkQ4wcOItqaACa7txPeKvSxhObdq3u3ibo7wGvdA/B CaL2a869080UME/15eOkyGKbghoDJzANAmVgTe3RCSMqljVYJ9N2xnG2kB3E7f81hn1vM7Pb D8URwoqDoZRdQWvY0hD1TP3KUazZve+Sg7va64sWVlZDz+HVEz2mHycwzUlU28kTNJpxdcVs 6qcLmPkhnSevPqM5OUhqjK3JmfvDEvK9AgMBAAGjggGGMIIBgjAOBgNVHQ8BAf8EBAMCAQYw HQYDVR0OBBYEFEm3xs/oPR9/6kR7Eyn38QpwPt5kMB8GA1UdIwQYMBaAFDHDeRu69VPXF+CJ ei0XbAqzK50zMBIGA1UdEwEB/wQIMAYBAf8CAQIwYgYDVR0gBFswWTARBg8rBgEEAYGtIYIs AQEEAgIwEQYPKwYBBAGBrSGCLAEBBAMAMBEGDysGAQQBga0hgiwBAQQDATAPBg0rBgEEAYGt IYIsAQEEMA0GCysGAQQBga0hgiweMD4GA1UdHwQ3MDUwM6AxoC+GLWh0dHA6Ly9wa2kwMzM2 LnRlbGVzZWMuZGUvcmwvRFRfUk9PVF9DQV8yLmNybDB4BggrBgEFBQcBAQRsMGowLAYIKwYB BQUHMAGGIGh0dHA6Ly9vY3NwMDMzNi50ZWxlc2VjLmRlL29jc3ByMDoGCCsGAQUFBzAChi5o dHRwOi8vcGtpMDMzNi50ZWxlc2VjLmRlL2NydC9EVF9ST09UX0NBXzIuY2VyMA0GCSqGSIb3 DQEBCwUAA4IBAQBjICj9nCGGcr45Rlk5MiW8qQGbDczKfUGchm0KbiyzE1l1sTOSG2EnFv/D stU1gvuEKgFJvWa7Zi+ywgZdbj9u4wFaW8pDY1yVtuExpx/VB19N5mWCTjL5w3x6S81NXHTu IfJ1AuxSPtLJatOQI25JZzW+f01WpOzML8+3oZeocj7JvEDWWqQIPda8gsO3tzKOsSyOam23 NQIZz/U5RFhjpyQAELC7/E6vbi84u6VXST/YblBvLJeW3B1GmmWJz67M8uXZn1OzPqEvkqnY C8aEHwTG6x7on321e6UC8STFJGMRNMxakyAqeYg6JUKQqWU7fIbTEhUjKfws2sw5W1QXMIIF YzCCBEugAwIBAgIHFj6RQyeZKDANBgkqhkiG9w0BAQUFADCBlDELMAkGA1UEBhMCREUxMjAw BgNVBAoTKUZvcnNjaHVuZ3N6ZW50cnVtIERyZXNkZW4tUm9zc2VuZG9yZiBlLlYuMSAwHgYD VQQLExdJbmZvcm1hdGlvbnN0ZWNobm9sb2dpZTEVMBMGA1UEAxMMRlpELUNBIC0gRzAyMRgw FgYJKoZIhvcNAQkBFglyYUBmemQuZGUwHhcNMTMwODI5MDkyNjExWhcNMTYwODI4MDkyNjEx WjBhMQswCQYDVQQGEwJERTE8MDoGA1UEChMzSGVsbWhvbHR6LVplbnRydW0gRHJlc2RlbiAt IFJvc3NlbmRvcmYgZS4gVi4gKEhaRFIpMRQwEgYDVQQDEwtIdWVibCwgQXhlbDCCASIwDQYJ KoZIhvcNAQEBBQADggEPADCCAQoCggEBANWIEuWannP8AfgdPc+sHhQjnwfs9bpNZXuptGRT 5iX4mwlPECrDOLBgfszBke+NGgXKJz1moIRZ8wvatJDQ8OTCbENoa1gmpCKZ4ryB+3XSxl+r BbM2eH7koCeagqbifypdoElI4wtc3QRLs8ZURhxfjWn+Vv3qs5od7HVypVka8WqLkqS6LgFL /LGyp+uqV0m778ExVaoWghXronhcDk10nUJSaWqHLCpHvWv/6fB8Tf08hNWRVB5ilUHHBWvu vCAQjhSbH4YFFoZDXB5PcFhvNOnT5cOIrTf5XMPMtubAvZ14S+wtkt9eDMZSOyT0KOkGw1cE 2X54vrdvkUHa/IMCAwEAAaOCAeowggHmMC8GA1UdIAQoMCYwEQYPKwYBBAGBrSGCLAEBBAMA MBEGDysGAQQBga0hgiwCAQQDADAJBgNVHRMEAjAAMAsGA1UdDwQEAwIF4DApBgNVHSUEIjAg BggrBgEFBQcDAgYIKwYBBQUHAwQGCisGAQQBgjcUAgIwHQYDVR0OBBYEFK228r82UkHXs2AB GJRtWlvbnP+IMB8GA1UdIwQYMBaAFKUpSPWVmRi1PjfIhgKKTr9C8B2MMBoGA1UdEQQTMBGB D2EuaHVlYmxAaHpkci5kZTB7BgNVHR8EdDByMDegNaAzhjFodHRwOi8vY2RwMS5wY2EuZGZu LmRlL2Z6ZC1jYS9wdWIvY3JsL2dfY2FjcmwuY3JsMDegNaAzhjFodHRwOi8vY2RwMi5wY2Eu ZGZuLmRlL2Z6ZC1jYS9wdWIvY3JsL2dfY2FjcmwuY3JsMIGWBggrBgEFBQcBAQSBiTCBhjBB BggrBgEFBQcwAoY1aHR0cDovL2NkcDEucGNhLmRmbi5kZS9memQtY2EvcHViL2NhY2VydC9n X2NhY2VydC5jcnQwQQYIKwYBBQUHMAKGNWh0dHA6Ly9jZHAyLnBjYS5kZm4uZGUvZnpkLWNh L3B1Yi9jYWNlcnQvZ19jYWNlcnQuY3J0MA0GCSqGSIb3DQEBBQUAA4IBAQA9Q7+cxRoFjWw8 oc1otv9P7yBdtY+JAVE1mmEGzeU/Tqqupe+/3N/e4euqPqzbTgcaw/H0e7K831cCe53ux+CB zJZH+kSkY+bqX+SxP8ndgRBDVFe6SvL+RiO49xOB4irg+a6otTWjuI8pUDTqUjGht82MD/rf 1gAUXRxPRU2nrt7BtyYtNKvP14xxnZrghvx/DZ6YsaVV14w1SpmtiiJ6q2WsbtiWKnoNCrWA 3Qr/9GK1tzGKQWHq21KDIm0bysjUt1hcwowM8OtPAQO66ywD72aR+NQB+LKTAnmPHBIgC7uW nLNpR9+CPubwTlq4z4f58uqnF5+JjiRgnfhJZGpvMIIFaDCCBFCgAwIBAgIHF5Bg2EcW5zAN BgkqhkiG9w0BAQsFADBaMQswCQYDVQQGEwJERTETMBEGA1UEChMKREZOLVZlcmVpbjEQMA4G A1UECxMHREZOLVBLSTEkMCIGA1UEAxMbREZOLVZlcmVpbiBQQ0EgR2xvYmFsIC0gRzAxMB4X DTE0MDUxMjE1MDU0NFoXDTE5MDcwOTIzNTkwMFowgZQxCzAJBgNVBAYTAkRFMTIwMAYDVQQK EylGb3JzY2h1bmdzemVudHJ1bSBEcmVzZGVuLVJvc3NlbmRvcmYgZS5WLjEgMB4GA1UECxMX SW5mb3JtYXRpb25zdGVjaG5vbG9naWUxFTATBgNVBAMTDEZaRC1DQSAtIEcwMjEYMBYGCSqG SIb3DQEJARYJcmFAZnpkLmRlMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA8m1G eCgBDpPgT8IUvZ2FQLJbvz6fpH1JA+DlZgNog4uFLue/6Lh9pT8EvcNbor8Qb2rt9rwRbk3p 3WqEc7AUZdxUY9ZGaYqcR9BUwHaFyaEcSYmKTfQ2scoxiazM+rmxu+UXtF/wBh8kHo9CSsY4 eWy1GSDkYksZlWhqOImpVkQmEWgoqu7brg60ug6YtbzE3n93MVQRQF4bFS2Kxvh+Rpq7u9eP 1UYIqG0+0G7T3v9CT/a7gw7qGQGkryoE+G3YoRq/1KM4+LZVWKgQzLGH3XMzgSbGlXzWfJ/a WsD88UkKyKjMiFeUisbO7yyM5FhYaZT2f6ZM25AuBvI+P+ZhOQIDAQABo4IB9jCCAfIwEgYD VR0TAQH/BAgwBgEB/wIBATAOBgNVHQ8BAf8EBAMCAQYwEQYDVR0gBAowCDAGBgRVHSAAMB0G A1UdDgQWBBSlKUj1lZkYtT43yIYCik6/QvAdjDAfBgNVHSMEGDAWgBRJt8bP6D0ff+pEexMp 9/EKcD7eZDAUBgNVHREEDTALgQlyYUBmemQuZGUwgYgGA1UdHwSBgDB+MD2gO6A5hjdodHRw Oi8vY2RwMS5wY2EuZGZuLmRlL2dsb2JhbC1yb290LWNhL3B1Yi9jcmwvY2FjcmwuY3JsMD2g O6A5hjdodHRwOi8vY2RwMi5wY2EuZGZuLmRlL2dsb2JhbC1yb290LWNhL3B1Yi9jcmwvY2Fj cmwuY3JsMIHXBggrBgEFBQcBAQSByjCBxzAzBggrBgEFBQcwAYYnaHR0cDovL29jc3AucGNh LmRmbi5kZS9PQ1NQLVNlcnZlci9PQ1NQMEcGCCsGAQUFBzAChjtodHRwOi8vY2RwMS5wY2Eu ZGZuLmRlL2dsb2JhbC1yb290LWNhL3B1Yi9jYWNlcnQvY2FjZXJ0LmNydDBHBggrBgEFBQcw AoY7aHR0cDovL2NkcDIucGNhLmRmbi5kZS9nbG9iYWwtcm9vdC1jYS9wdWIvY2FjZXJ0L2Nh Y2VydC5jcnQwDQYJKoZIhvcNAQELBQADggEBAC9Y+prXCAxJzhcGTqHbUWZN0BbctjNv4zor znGPFZ42NVfCSqR9gIRiwnDBYlBJ+Q+PppFZNE/97E1XCmk/iFWE89wWtEfTem5OPjKej3Ff nuVCl1e11o8re9j91KC2Sv4B5tXVwZ1C0tjJDFvA0c4g42pce38LNR5kpuGPeGXrDCscTF0R 1eTWzt2OLBPVjLl43Sf8RbIM+R0s1VSlb/YGnVLUAK8TcRuoNgDwIFa7uxtC7DP2c+WrQt4l ESADB2finpsWW9a+prAH6RWd8PuW3lPgXBd7vum8wpXfSVF3oXvPakBQaz57Dr9EC6GMMkKy 8PRxj3Ak0/XQAIKJWPoxggQBMIID/QIBATCBoDCBlDELMAkGA1UEBhMCREUxMjAwBgNVBAoT KUZvcnNjaHVuZ3N6ZW50cnVtIERyZXNkZW4tUm9zc2VuZG9yZiBlLlYuMSAwHgYDVQQLExdJ bmZvcm1hdGlvbnN0ZWNobm9sb2dpZTEVMBMGA1UEAxMMRlpELUNBIC0gRzAyMRgwFgYJKoZI hvcNAQkBFglyYUBmemQuZGUCBxY+kUMnmSgwCQYFKw4DAhoFAKCCAjUwGAYJKoZIhvcNAQkD MQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTUwNTA4MDcyMDUwWjAjBgkqhkiG9w0B CQQxFgQUs2Mq6Vuov+tjdjNB9Ev7AhU5UwYwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQME ASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0D AgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBsQYJKwYBBAGCNxAEMYGjMIGgMIGUMQsw CQYDVQQGEwJERTEyMDAGA1UEChMpRm9yc2NodW5nc3plbnRydW0gRHJlc2Rlbi1Sb3NzZW5k b3JmIGUuVi4xIDAeBgNVBAsTF0luZm9ybWF0aW9uc3RlY2hub2xvZ2llMRUwEwYDVQQDEwxG WkQtQ0EgLSBHMDIxGDAWBgkqhkiG9w0BCQEWCXJhQGZ6ZC5kZQIHFj6RQyeZKDCBswYLKoZI hvcNAQkQAgsxgaOggaAwgZQxCzAJBgNVBAYTAkRFMTIwMAYDVQQKEylGb3JzY2h1bmdzemVu dHJ1bSBEcmVzZGVuLVJvc3NlbmRvcmYgZS5WLjEgMB4GA1UECxMXSW5mb3JtYXRpb25zdGVj aG5vbG9naWUxFTATBgNVBAMTDEZaRC1DQSAtIEcwMjEYMBYGCSqGSIb3DQEJARYJcmFAZnpk LmRlAgcWPpFDJ5koMA0GCSqGSIb3DQEBAQUABIIBAA+xSBncPjl3Adtiojm+KmkltQLbM+yl dyZcLtHdnHWqkwPDLd9hqlb4e/4Q3OqPGfevp3A48uOIOWXhbyTycxS29Ky7huTZ87HA0xsN cctXwS8I40Ef1QNXf7bWBlZpbthk7/NYOR16ou0I83PQ3DHHEwrzWou7nHSOLJnC5dFxzmRG Q7H0eKazUvvisKYfbcFBrpHGcuc490GotZm8nO+FvGECLhpvl8LLRb+whGBLZSGLTsbR2Qqt TwRF4SMuyR72aXgPe3RU84+u9dt344lbYkIpIb6M/KhmlfHCG7SdT8OKj8s4NnVxC/gt3N+d WojPPMcRHW7NK3dsVHk5hbcAAAAAAAA= --------------ms020002000808020109010002--