You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The latter one ships with a heavily outdated version of the public suffix list. Crawler-commons EffectiveTldFinder loads the "effective_tld_names.dat" from class path. When running in distributed mode here is no control which dependency jar is first on the class path. So it may happen that the outdated version is loaded.
Ideally, the most recent version of the public suffix list should be used. This could be achieved by downloading the list during build and placing it in the "conf/" folder which is always first in the class path.
The text was updated successfully, but these errors were encountered:
The public suffix list (using the old file name "effective_tld_names.dat") is shipped twice in the Nutch job file in the dependency jar files of
The latter one ships with a heavily outdated version of the public suffix list. Crawler-commons EffectiveTldFinder loads the "effective_tld_names.dat" from class path. When running in distributed mode here is no control which dependency jar is first on the class path. So it may happen that the outdated version is loaded.
Ideally, the most recent version of the public suffix list should be used. This could be achieved by downloading the list during build and placing it in the "conf/" folder which is always first in the class path.
The text was updated successfully, but these errors were encountered: