Potential deadlock when running many MPI processes on unstructured grid with XIOS server
This bug comes from @sylvain.mailler. If I understood correctly, his run never ends, even though, the logs and the output files indicate otherwise:
- DCMIP41 experiment
- 39 processes for DYNAMICO
- 1 process for XIOS server
-
mesh_par_ba.39.nc
as unstructured mesh -
output_dcmip2016
has data inside - no errors, nor warnings in XIOS logs
- XIOS logs contain the end reports (memory, time)
The log indicates that the run has ended:
Time spent (s): 70 -- ms/step : 25.11 -- Throughput : 11949
Whole job (min) : 1 -- Completion in (min) : 0
It No : 2821 t : 846300.0
It No : 2822 t : 846600.0
It No : 2823 t : 846900.0
It No : 2824 t : 847200.0
It No : 2825 t : 847500.0
It No : 2826 t : 847800.0
It No : 2827 t : 848100.0
Time elapsed : 70.9934120000000
Some example logs and configurations:
@thomas.dubos Do you have any hypothesis to test?
Edited by Patryk Kiepas