From 73e9a68312b6654f597d95d465aad9376e0e074b Mon Sep 17 00:00:00 2001 From: Hadrien Grasland <hadrien.grasland@gmx.fr> Date: Sun, 7 Jul 2024 21:56:00 +0200 Subject: [PATCH] Adapt chapter 27 to actual results --- handouts/src/27-copyright.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/handouts/src/27-copyright.md b/handouts/src/27-copyright.md index 9b76ddf..1b202df 100644 --- a/handouts/src/27-copyright.md +++ b/handouts/src/27-copyright.md @@ -7,7 +7,7 @@ Since writing to disk is mostly handled by HDF5, we will first look into the code which generates the output images, on which we have more leverage. -## A problematic copy +## An unnecessary copy Remember that to simplify the porting of the code from CPU to GPU, we initially made the GPU version of `Concentrations` expose a `current_v()` interface with @@ -32,14 +32,13 @@ function: } ``` -As a trip through a CPU profiler will tell you, that copy is where most of the -simulation time is spent in our current default configuration, where an output -image is emitted every 34 simulation steps. +As a trip through a CPU profiler will tell you, we spend most of our CPU time +copying data around in our current default configuration, where an output image +is emitted every 34 simulation steps. So this copy might be performance-critical. -Of course, one way to speed up this copy would be to parallelize it. But it -would be more satisfying and more efficient to get rid of it entirely, along -with the associated `v_ndarray` member of the `Concentrations` struct. Let's see -what it would take. +We could try to speed it up by parallelizing it. But it would be more satisfying +and more efficient to get rid of it entirely, along with the associated +`v_ndarray` member of the `Concentrations` struct. Let's see what it would take. ## What's in a buffer `read()`? @@ -137,6 +136,9 @@ hdf5.write(concentrations.current_v().view())?; Integrate these changes, and measure their effect on runtime performance. +You may notice that your microbenchmarks tell a different story than the running +time of the main simulation binary. Can you guess why? + --- -- GitLab