fastjet/internal/delaunay: implement hierarchical delaunay insertion
Created by: Bastiantheone
This PR implements the insertion using the hierarchical delaunay algorithm.
The benchmark numbers are significantly worse than the version in the first PR without the predicates. The cpu profile shows that swapDelaunay takes up most of the time. That is where the incircle function is called. If possible improving the incircle function would be the best for the performance.