Add MPI-based distributed data processing
More data sharding as in numerical-rust-cpu#6?
This should actually be easier than #2, because with #2 there's a temptation to look for fast inter-GPU communication which is hard-ish in Vulkan, whereas here MPI is by definition the only option.