-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex fix, checkip argument & updated tld list #283
Conversation
This patch include 3 fixes: 1) The plugin could match invalid URIs, eg. "www..example.com", which resulted in fatal plugin error, sending the incoming email directly into queue without any further scan. The regex has been fixed. 2) Some URIBLs trigger positive on any ip-address. Added new plugin argument: checkip [yes|no]. 3) Updating the TLD list.
Duplicating that now-enormous TLD list feels wrong. Part of me would prefer to see it in a flat text file, or even a |
That solution was to save overhead, but I toally agree that it's much cleaner and easier to maintain when putting the list in a txt-file. |
Either solution works for me. If I were trying to decide, I'd likely write a test file that loads the URIBL plugin and then runs a few test messages through it and benchmark them. My guess is that either way will work about the same (since the OS kernel will end up caching that file). If you really wanted to save some time, I'd look at compiling that regex ahead of time. |
1 similar comment
Much better, thanks. All done? Tested? |
Thank you. Tested and working on my production server. I think I've found another bug in this plugin though that I'm currently investigating, so please hold the merge for a little while. |
Moving the TLD list to its own file, config/uribl_tlds as the __DATA__ section wasn't quite working.
On continuation lines, half lines where parsed and searching for matches. This could cause false hits. Now only run checks on the complete joined lines. Also take into account if the last line end with "=" char.
plugins/uribl
Outdated
(my $tlds4regex = join('|', grep(/\S/, <DATA>))) =~ s/\R//g; | ||
# Fetch the TLD list and prepare it for use in RegEx | ||
my @tlds = $self->qp->config('uribl_tlds'); | ||
(my $tlds4regex = join('|', grep(/\S/, <@tlds>))) =~ s/\R//g; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't reuse that array, and it's more than trivial in size, might it be better (re: more memory efficient) to inline it and store only $tlds4regex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, how about a sub in the end of the file that returns the loooong string? Defining that function in the bottom will keep the code easy to read in the lookup_start sub.
@msimerson Good to merge? (Though aren't there CPAN modules maintaining that list?) |
No objections from me. Although the TLD list is almost certainly out of date. I'd approve merging it simply based on: it's better than what's there currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better than current.
One thing I've noticed is a risk of false-positive on html-emails when stylesheet classes are defined. <style> div.class1.class2 { } </style>If the domain "class1.class2" are blacklisted, the above will trigger false-positive. This have always been the case, but now with all the new TLD's, there is a greater chance. We should exclude searching texts within <style></style> tags. |
This patch include 3 fixes:
The plugin could match invalid URIs, eg. "www..example.com", which resulted in fatal plugin error, sending the incoming email directly into queue without any further scan. The regex has been fixed.
Some URIBLs trigger positive on any ip-address. Added new plugin argument: checkip [yes|no].
Updating the TLD list.