Skip to content

Investigate getOccByNgramsOnlyFast

This is a follow-up of #505 .

In https://gitlab.in2p3.fr/iscpif/gargantext/haskell-gargantext/merge_requests/445 I introduced some fixes to the query.

However, the functions getOccByNgramsOnlyFast_withSample and getOccByNgramsOnlyFast should be analyzed closer.

The second one returns a HashMap NgramsTerm [ContextId]. That context list is generated tediously by postgres, collecting DISTINCT context_id into an array. That function, via setNgramsTableScores is used to return ngrams with a list of their occurrences (all context ids).

However, when I look at purescript code, and search for occurrences, I see mostly things like sumOccurrences or Set.size occurrences which would suggest that we only need occurrences_count.

This could help further simplify this query, if only count were needed.