More final chapter tweaking

6fe53219 · Hadrien Grasland · 0b712a44 · 6fe53219
Commit 6fe53219 authored 9 months ago by Hadrien Grasland
--- a/handouts/src/29-finish-him.md
+++ b/handouts/src/29-finish-him.md
@@ -30,16 +30,16 @@ are some possible tracks that you may want to explore:
  local memory. This will likely change the optimal work-group size, so we will
  want to make this parameter easily tunable via specialization constants, then
  tune it.
+- Modern GPUs also have explicit SIMD instructions, which in Vulkan are
+  accessible via the optional subgroup extension. It should be possible to use
+  them to exchange neighboring data between threads faster than local memory can
+  allow. But in doing so, we will need to handle the fact that subgroups are not
+  portable across all hardware (the Vulkan extension may or may not be present),
+  which will likely require some code duplication.
 - The advanced SIMD chapter's data layout was designed for maximal efficiency on
  SIMD hardware, and modern GPUs are basically a thin shell around a bunch of
  SIMD ALUs. Would this layout also help GPU performance? There is only one way
  to find out.
- Speaking of SIMD, modern GPUs also have explicit SIMD instructions, which in
-  Vulkan are accessible via the optional subgroup extension. It should be
-  possible to use them to exchange neighboring data between threads faster than
-  local memory can allow. But in doing so, we will need to handle the fact that
-  subgroups are not portable across all hardware (the Vulkan extension may or
-  may not be present), which will likely require some code duplication.
 - Our microbenchmarks tell us that our GPU is not quite operating at peak
  throughput when processing a single 1920x1080 image. It would be nice to try
  processing multiple images in a single compute dispatch, but this will require