Skip to content

Failure to import large corpora

Summary

This might be specific to either the IMT instance or the HAL query but it seems that GTX fails to import the full corpora of all IMT publications : https://imt.sub.gargantext.org/#/share/NodeCorpus/132585

There is also en error in the doc chart which suggest that some process has been interrupted somewhere in the middle (on this chart, we have mostly doc in 2014, which does not reflect the state of the system).

image

Steps to reproduce

The query is API -> in database : HAL -> filter with organization: IMT : all_IMT

What is the current bug behavior?

The import is stuck at 49257. The relaunch of the query do not update the corpora. Estimated final corpora size is 100k doc.

What is the expected correct behavior?

Edited by Fabien MANIERE