Commit b2f3e1ce authored by CHAMONT David's avatar CHAMONT David
Browse files

Cours complet

parent 9368f3ab
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Choose your data structure"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## AoS (Array of Structs)\n",
"\n",
"In the code example below, a \"SAXPY\" (`y = a*x+y`) calculation is done on a collection of `XY` elements."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting tmp.xy.h\n"
]
}
],
"source": [
"%%file tmp.xy.h\n",
"\n",
"struct XY\n",
" {\n",
" double x, y {0.} ;\n",
" void saxpy( double a )\n",
" { y = a*x + y ; }\n",
" } ;"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting tmp.aos-functions.h\n"
]
}
],
"source": [
"%%file tmp.aos-functions.h\n",
"\n",
"#include <cstdlib> // for rand\n",
"\n",
"template< typename Itr >\n",
"void randomize_x( Itr begin, Itr end )\n",
" {\n",
" for ( Itr itr = begin ; itr!=end ; ++itr )\n",
" { itr->x = std::rand()/(RAND_MAX+1.)-0.5 ; }\n",
" }\n",
"\n",
"template< typename Itr >\n",
"void saxpy( Itr begin, Itr end, double a )\n",
" {\n",
" for ( Itr itr = begin ; itr!=end ; ++itr )\n",
" { itr->saxpy(a) ; }\n",
" }\n",
"\n",
"template< typename Itr >\n",
"double accumulate_y( Itr begin, Itr end )\n",
" {\n",
" double res {0.} ;\n",
" for ( Itr itr = begin ; itr!=end ; ++itr )\n",
" { res += itr->y ; }\n",
" return res ;\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting tmp.aos.cpp\n"
]
}
],
"source": [
"%%file tmp.aos.cpp\n",
"\n",
"#include \"tmp.xy.h\"\n",
"#include \"tmp.aos-functions.h\"\n",
"#include <cassert> // for assert\n",
"#include <cstdlib> // for atoi\n",
"#include <iostream>\n",
"\n",
"int main( int argc, char * argv[] )\n",
" {\n",
" assert(argc==3) ;\n",
" int size {atoi(argv[1])} ;\n",
" int repeat {atoi(argv[2])} ;\n",
" std::cout.precision(18) ;\n",
"\n",
" XY * collection {new XY[size]} ;\n",
" auto begin {collection} ;\n",
" auto end {begin+size} ;\n",
"\n",
" randomize_x(begin,end) ;\n",
" while (repeat--)\n",
" saxpy(begin,end,0.1) ;\n",
" double res {accumulate_y(begin,end)/size} ;\n",
" std::cout<<res<<std::endl ;\n",
"\n",
" delete [] collection ;\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting tmp.aos.sh\n"
]
}
],
"source": [
"%%file tmp.aos.sh\n",
"echo\n",
"\n",
"rm -f tmp.aos.exe tmp.aos.py\n",
"g++ -std=c++17 tmp.aos.cpp -o tmp.aos.exe\n",
"./tmp.aos.exe $*\n",
"\n",
"echo \"s = 0\" >> tmp.aos.py\n",
"for i in 0 1 2 3 4 5 6 7 8 9\n",
"do \\time -f \"s += %U\" -a -o ./tmp.aos.py ./tmp.aos.exe $* >> /dev/null\n",
"done\n",
"echo \"print('(~ {:.3f} s)'.format(s/10.))\" >> tmp.aos.py\n",
"python3 tmp.aos.py\n",
"\n",
"echo"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"67.5053500207703507\n",
"(~ 1.380 s)\n",
"\n"
]
}
],
"source": [
"!bash -l tmp.aos.sh 1024 100000"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The `main` function is currently using an old-fashioned C array, and the script does not set explicitly the GCC optimization option, which means it is using the default `-O0` (no compiler optimization)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
" You are asked to try this code, then investigate the alternative arrays `std::array`, `std::valarray`, `std::vector`, `std::list` and the alternative GCC compilation options `-O2` (usual optimisations) and `-O3` (aggressive optimizations, including automatic vectorization). Fill the results below, and try to explain the differences.\n",
"\n",
"| Array \\ Option | -O0 | -O2 | -O3 |\n",
"| :--------------------- | ---: | ---: | ---: |\n",
"| Classic C array | 0. | 0. | 0. |\n",
"| std::array | 0. | 0. | 0. |\n",
"| std::valarray | 0. | 0. | 0. |\n",
"| std::vector | 0. | 0. | 0. |\n",
"| std::list | 0. | 0. | 0. |\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## SoA (Struct of Arrays)\n",
"\n",
"Now let's try another approach: instead of creating a structure that groups together `x` and` y` and making it into an array (as it is naturally done on an object-oriented approach), let's try to make a global structure that contains an array of `x` on one hand, and an array of `y` on the other hand.\n",
"\n",
"This is what the code skeleton below offers, again using C arrays and default -O0. Again, try alternative collections and compilation options. Fill the results table and explain."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting tmp.soa.h\n"
]
}
],
"source": [
"%%file tmp.soa.h\n",
"\n",
"#include \"tmp.xy.h\"\n",
"\n",
"class SoA\n",
" {\n",
" public :\n",
" SoA( int size ) : m_size(size), m_xs(new double[size]), m_ys(new double[size]) {}\n",
" ~SoA() { delete [] m_xs ; delete [] m_ys ; }\n",
" int size() { return m_size ; }\n",
" XY operator()( int indice ) const\n",
" { return { m_xs[indice], m_ys[indice] } ; }\n",
" auto & xs() { return m_xs ; }\n",
" auto & ys() { return m_ys ; }\n",
" void saxpy( double a )\n",
" {\n",
" for ( int i=0 ; i<m_size ; ++i )\n",
" m_ys[i] = a*m_xs[i] + m_ys[i] ;\n",
" }\n",
" private :\n",
" int m_size ;\n",
" double * m_xs ;\n",
" double * m_ys ;\n",
" } ;"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing tmp.soa-functions.h\n"
]
}
],
"source": [
"%%file tmp.soa-functions.h\n",
"\n",
"#include \"tmp.soa.h\"\n",
"#include <cstdlib> // for rand\n",
"\n",
"void randomize_x( SoA & collection )\n",
" {\n",
" for ( int i=0 ; i<collection.size() ; ++i )\n",
" { collection.xs()[i] = std::rand()/(RAND_MAX+1.)-0.5 ; }\n",
" }\n",
"\n",
"double accumulate_y( SoA & collection )\n",
" {\n",
" double res {0.} ;\n",
" for ( int i=0 ; i<collection.size() ; ++i )\n",
" { res += collection.ys()[i] ; }\n",
" return res ;\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing tmp.soa.cpp\n"
]
}
],
"source": [
"%%file tmp.soa.cpp\n",
"\n",
"#include \"tmp.soa-functions.h\"\n",
"#include <iostream>\n",
"#include <cassert> // for assert\n",
"#include <cstdlib> // for atoi\n",
"\n",
"int main( int argc, char * argv[] )\n",
" {\n",
" assert(argc==3) ;\n",
" int size {atoi(argv[1])} ;\n",
" int repeat {atoi(argv[2])} ;\n",
"\n",
" SoA collection(size) ;\n",
" randomize_x(collection) ;\n",
" while (repeat--)\n",
" collection.saxpy(0.1) ;\n",
" double res = accumulate_y(collection)/size ;\n",
"\n",
" std::cout.precision(18) ;\n",
" std::cout<<res<<std::endl ;\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing tmp.soa.sh\n"
]
}
],
"source": [
"%%file tmp.soa.sh\n",
"echo\n",
"\n",
"rm -f tmp.soa.exe tmp.soa.py\n",
"g++ -std=c++17 tmp.soa.cpp -o tmp.soa.exe\n",
"./tmp.soa.exe $*\n",
"\n",
"echo \"s = 0\" >> tmp.soa.py\n",
"for i in 0 1 2 3 4 5 6 7 8 9\n",
"do \\time -f \"s += %U\" -a -o ./tmp.soa.py ./tmp.soa.exe $* >> /dev/null\n",
"done\n",
"echo \"print('({:.3f} s)'.format(s/10.))\" >> tmp.soa.py\n",
"python3 tmp.soa.py\n",
"\n",
"echo"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"67.5053500207703507\n",
"(0.920 s)\n",
"\n"
]
}
],
"source": [
"!bash -l tmp.soa.sh 1024 100000"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"To help in the analysis, [GodBolt](https://godbolt.org/) can be used, which allows to observe the dose of \"inlining\", or to look for the presence of vectorial instructions in assembly, such as `addpd` (Add Packed Doubles) or` mulpd` (Multiply Packed Double).\n",
"\n",
"| Array \\ Option | -O0 | -O2 | -O3 |\n",
"| :--------------------- | ---: | ---: | ---: |\n",
"| Classic C array | 0. | 0. | 0. |\n",
"| std::array | 0. | 0. | 0. |\n",
"| std::valarray | 0. | 0. | 0. |\n",
"| std::vector | 0. | 0. | 0. |\n",
"| std::list | 0. | 0. | 0. |\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"© *CNRS 2021*\n",
"*Assembled and written in french by David Chamont, translated by Karim Hasnaoui, this work is made available according to the terms of the [Creative Commons License - Attribution - NonCommercial - ShareAlike 4.0 International](http://creativecommons.org/licenses/by-nc-sa/4.0/)*"
]
}
],
"metadata": {
"celltoolbar": "Diaporama",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
This diff is collapsed.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Performance optimization\n",
"\n",
"1. [Choice of data structure](en.1-arrays.ipynb)\n",
"1. [The cost of the different operations](en.2-operations.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"© *CNRS 2021*\n",
"*Assembled and written in french by David Chamont, translated by Karim Hasnaoui, this work is made available according to the terms of the [Creative Commons License - Attribution - NonCommercial - ShareAlike 4.0 International](http://creativecommons.org/licenses/by-nc-sa/4.0/)*"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment