diff --git a/Convert_Matlab/Documentation_texfol/.gitignore b/Convert_Matlab/Documentation_texfol/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..db71965260a82d51a334c0d8f78459e60e91271a --- /dev/null +++ b/Convert_Matlab/Documentation_texfol/.gitignore @@ -0,0 +1,9 @@ +/*.aux +/*.log +/*.out +/*.pdf +/*.synctex* +/*.toc +/*.lof +/*.lot +/auto/ diff --git a/Convert_Matlab/Documentation_texfol/Graphiques/.gitignore b/Convert_Matlab/Documentation_texfol/Graphiques/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..ec1fa5dd310fb281049c1817f7c1c2480ba01d35 --- /dev/null +++ b/Convert_Matlab/Documentation_texfol/Graphiques/.gitignore @@ -0,0 +1,2 @@ +SHPC.pdf +convert_Matlab.pdf diff --git a/Convert_Matlab/Documentation_texfol/Graphiques/GNUmakefile b/Convert_Matlab/Documentation_texfol/Graphiques/GNUmakefile new file mode 100644 index 0000000000000000000000000000000000000000..2a8f0df586dfa9e3be29bf7a496e57a95e6b2ffc --- /dev/null +++ b/Convert_Matlab/Documentation_texfol/Graphiques/GNUmakefile @@ -0,0 +1,10 @@ +sources = convert_Matlab +objects := $(addsuffix .pdf, ${sources}) + +%.pdf: %.odg + unoconv --doctype=graphics $< + +all: ${objects} + +clean: + rm -f ${objects} diff --git a/Convert_Matlab/Documentation_texfol/Graphiques/SHPC.odg b/Convert_Matlab/Documentation_texfol/Graphiques/SHPC.odg new file mode 100644 index 0000000000000000000000000000000000000000..9cc0fe42e05db614949c69e1439ccb43a29baa9c Binary files /dev/null and b/Convert_Matlab/Documentation_texfol/Graphiques/SHPC.odg differ diff --git a/Convert_Matlab/Documentation_texfol/Graphiques/convert_Matlab.odg b/Convert_Matlab/Documentation_texfol/Graphiques/convert_Matlab.odg new file mode 100644 index 0000000000000000000000000000000000000000..0807504e95870a68e9e3e28cf8ec40555ac0a1d9 Binary files /dev/null and b/Convert_Matlab/Documentation_texfol/Graphiques/convert_Matlab.odg differ diff --git a/Convert_Matlab/Documentation_texfol/documentation.tex b/Convert_Matlab/Documentation_texfol/documentation.tex new file mode 100644 index 0000000000000000000000000000000000000000..eb76a3933e916fe8a23043ceb03d04aa7fa19ddb --- /dev/null +++ b/Convert_Matlab/Documentation_texfol/documentation.tex @@ -0,0 +1,155 @@ +\documentclass[a4paper,french]{article} + +\usepackage[utf8]{inputenc} + +\usepackage[T1]{fontenc} +\usepackage{lmodern} + +\usepackage{babel} +\usepackage{graphicx} +\usepackage[super]{nth} + +\usepackage{hyperref} + +\hypersetup{pdftitle={Documentation convert Matlab}, pdfauthor={Lionel + Guez}, hypertexnames=false} + + %--------------- + +\graphicspath{{Graphiques/}} + +\title{Documentation for scripts converting Matlab data} +\author{Lionel GUEZ} + + %--------------- + +\begin{document} + +\maketitle + +Pour utiliser le script \verb+overlap.m+, placer le script +lui-même et tous les fichiers d'entrée nécessaires dans le répertoire +courant, où je dois avoir l'autorisation d'écriture. Créer +éventuellement des liens symboliques. Puis : +\begin{verbatim} +matlab -nojvm -r overlap +\end{verbatim} + +Sur le domaine Eurec4A entier (toutes les dates, soit 117 dates), au +total pour les deux orientations. \verb+inst_eddies_v6.py+ prend +environ 11 mn et produit 3 MiB, pour 2951 tourbillons +instantanés. \verb+overlap.m+ prend 0 mn. \verb+overlap_v6.py+ prend 0 +mn et produit 48 KiB, pour 2793 arêtes. \verb+survival.m+ prend 0 mn. +\verb+survival.py+ prend 0 mn et produit 64 KiB pour 2863 n\oe{}uds et +169 trajectoires. + +Sur le domaine \verb+PhD_Lax+, le script \verb+overlap_HDF5.py+ prend +2 h 21 mn au total pour les deux graphes. Le script +\verb+overlap_v6.py+ prend 3 mn, auxquelles il faut ajouter 12 mn de +conversion au format v6 (total pour les fichiers nécessaires aux deux +graphes). + +Sur le domaine global, pour une seule date, \verb+inst_eddies_v6.py+ +prend 25 s (dont environ 21 s de conversion en +v6). \verb+inst_eddies_HDF5.py+ prend 38 s. Il reste le problème du +stockage intermédiaire pour les fichiers v6. Le fichier produit par +\verb+inst_eddies.m+ prend environ le quart de l'espace du fichier de +départ. + +\section{Instantaneous eddies} + +The data for instantaneous eddies is stored in shapefiles, in the +directories \verb+SHPC_(anti|cyclo)+. Cf. figure \ref{fig:convert_Matlab}. +\begin{figure}[htbp] + \centering + \includegraphics[width=\textwidth]{convert_Matlab} + \caption{Correspondance between input Matlab files and files + converted from them. Below the names of Matlab files are indicated + the names of the variables which were used inside those files, and + the fields used inside those variables.} + \label{fig:convert_Matlab} +\end{figure} +The directory \verb+SHPC_(anti|cyclo)+ contains a set of four +shapefiles: center, extremum, \verb+max_speed_contour+ and +\verb+outermost_contour+. The four shapefiles correspond to four \og +layers\fg{} of eddies. (\og layers\fg{} is a term that you can often +find in the documentation of software dealing with geographical data.) +Each layer corresponds to a given type of geometry. Here we have only +two types of geometry: points and polygons. The layers center and +extremum contain points, while the layers \verb+max_speed_contour+ and +\verb+outermost_contour+ contain polygons. The center layer is for the +geometric center of the maximum-speed contour, which is called +centroid in the Matlab files. The extremum layer is for the position +of the extremum of SSH, which is called center in the Matlab +files. Each eddy has a record, at the same subscript position, in the +four layers. Cf. figure \ref{fig:SHPC}. +\begin{figure}[htbp] + \centering + \includegraphics[width=\textwidth]{SHPC} + \caption{Contents of a shapefile collection: SHPC\_(anti|cyclo).} + \label{fig:SHPC} +\end{figure} +The shapefile collection in \verb+SHPC_(anti|cyclo)+ contains data for +all the dates. The dates are stored successively in the layers. From +the computer point of view, each shapefile is made up of three files, +ending with suffixes \verb+.shp+, \verb+.dbf+ and \verb+.shx+. The +\verb+.shp+ file contains the positions, the \verb+.dbf+ file contains +the metadata, and the \verb+.shx+ file is an index. So the \verb+.shp+ +file is the largest file of the three. But the three files form a +logical unit and you should never separate them. There is also a file +\verb+ishape_last.txt+ in the directory \verb+SHPC_(anti|cyclo)+ which +gives the last subscript in the shapefiles for each date. This file is +used to access directly any instantaneous eddy at any date. Finally, +there is a file, \verb+grid_nml.txt+, which gives (in Fortran namelist +format) the grid of SSH data from which the eddies were detected. + +The shapefiles are in binary format, so you need special software to +read them. There is actually a large number of programs to read +them. The metadata, in the \verb+.dbf+ file, is a spreadsheet, and you +can open it with LibreOffice. There are also shell commands that can +read shapefiles: dbfinfo, shpinfo, dbfdump and shpdump (analogous to +ncdump for NetCDF files). If you have a Linux Debian-type distribution +(Ubuntu, LinuxMint etc.), you can install these tools with: +\begin{verbatim} +sudo apt-get install shapelib +\end{verbatim} +The tools are installed on Ciclad in \verb+/data/guez/bin+. You can +also access the shapefiles programmatically with a number of computer +languages. From Python, one of the modules you can use is +\href{https://fiona.readthedocs.io/en/latest/}{Fiona}. Matlab has the +shaperead function. + +\section{Overlapping} + +The data on overlapping instantaneous eddies is in the files +\verb+edgelist_(anti|cyclo)+. The overlapping of eddies is represented +by an abstract graph. In the graph, a node is an instantaneous +eddy. Two nodes of the graph are connected by an edge if the two +corresponding eddies overlap. The graph is directed, which means that +the edges have a direction: the direction is chronological. + +The file \verb+edgelist_(anti|cyclo)+ stores, for a given orientation +of eddies, the whole graph as a list of edges. (This is a common graph +storage format.) Each line stores one edge: the origin node of the +edge followed by the target node of the edge. One edge is defined by a +couple of integers: the date index and the eddy index at that +date. The date index is the number of days since January \nth{1}, +1950. The eddy index is between 1 and the number of eddies at the +date. + +\section{Survival} + +We call survival the data that identifies several instantaneous +eddies as a same evolving physical object. The result of this analysis +is a dictionary of trajectories, stored in a file +\verb+traj_(anti|cyclo).json+. The key for each trajectory is the +identifying number of the trajectory and the value is the +corresponding list of instantaneous eddies. Here, an instantaneous +eddy is not identified by the couple date index and eddy +index. Rather, it is identified by an equivalent identifying number +that is constructed from the date index and the eddy index. We could +call this number the \og node index\fg{}, since it identifies a node +in the abstract graph of overlapping. The relation between node index, +$n$, date index, $d$ + +\end{document}