Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex fix, checkip argument & updated tld list #283

Merged
merged 5 commits into from
Jun 8, 2020

Conversation

mufus
Copy link
Contributor

@mufus mufus commented Mar 2, 2017

This patch include 3 fixes:

  1. The plugin could match invalid URIs, eg. "www..example.com", which resulted in fatal plugin error, sending the incoming email directly into queue without any further scan. The regex has been fixed.

  2. Some URIBLs trigger positive on any ip-address. Added new plugin argument: checkip [yes|no].

  3. Updating the TLD list.

This patch include 3 fixes:

1) The plugin could match invalid URIs, eg. "www..example.com", which resulted in fatal plugin error, sending the incoming email directly into queue without any further scan. The regex has been fixed.

2) Some URIBLs trigger positive on any ip-address. Added new plugin argument: checkip [yes|no].

3) Updating the TLD list.
@coveralls
Copy link

Coverage Status

Coverage remained the same at 48.821% when pulling cf603a1 on mufus:mufus-patch-1 into 65fd1b2 on smtpd:master.

@msimerson
Copy link
Member

Duplicating that now-enormous TLD list feels wrong. Part of me would prefer to see it in a flat text file, or even a __DATA__ block, so that you can plop it into that enormous RE alternation with join('|'). Either way, the list should appear just once.

@mufus
Copy link
Contributor Author

mufus commented Mar 2, 2017

That solution was to save overhead, but I toally agree that it's much cleaner and easier to maintain when putting the list in a txt-file.
I got a working patch for that. What do you say, should we put the file "uribl_tlds" in config directory rather than in the plugins dir? If going by the plugins dir, we should probably do the DATA solution to save extra file access.

@msimerson
Copy link
Member

Either solution works for me. If I were trying to decide, I'd likely write a test file that loads the URIBL plugin and then runs a few test messages through it and benchmark them. My guess is that either way will work about the same (since the OS kernel will end up caching that file). If you really wanted to save some time, I'd look at compiling that regex ahead of time.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 48.821% when pulling 4cd0283 on mufus:mufus-patch-1 into 65fd1b2 on smtpd:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage remained the same at 48.821% when pulling 4cd0283 on mufus:mufus-patch-1 into 65fd1b2 on smtpd:master.

@msimerson
Copy link
Member

Much better, thanks.

All done? Tested?

@mufus
Copy link
Contributor Author

mufus commented Mar 4, 2017

Thank you. Tested and working on my production server. I think I've found another bug in this plugin though that I'm currently investigating, so please hold the merge for a little while.

mufus added 2 commits March 4, 2017 19:44
Moving the TLD list to its own file, config/uribl_tlds as the __DATA__
section wasn't quite working.
On continuation lines, half lines where parsed and searching for
matches. This could cause false hits.
Now only run checks on the complete joined lines. Also take into account
if the last line end with "=" char.
@coveralls
Copy link

coveralls commented Mar 4, 2017

Coverage Status

Coverage remained the same at 48.821% when pulling b9084ea on mufus:mufus-patch-1 into 65fd1b2 on smtpd:master.

plugins/uribl Outdated
(my $tlds4regex = join('|', grep(/\S/, <DATA>))) =~ s/\R//g;
# Fetch the TLD list and prepare it for use in RegEx
my @tlds = $self->qp->config('uribl_tlds');
(my $tlds4regex = join('|', grep(/\S/, <@tlds>))) =~ s/\R//g;

Copy link
Member

@msimerson msimerson Mar 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't reuse that array, and it's more than trivial in size, might it be better (re: more memory efficient) to inline it and store only $tlds4regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, how about a sub in the end of the file that returns the loooong string? Defining that function in the bottom will keep the code easy to read in the lookup_start sub.

@coveralls
Copy link

coveralls commented Mar 4, 2017

Coverage Status

Coverage remained the same at 48.821% when pulling 33239e5 on mufus:mufus-patch-1 into 65fd1b2 on smtpd:master.

@abh
Copy link
Member

abh commented Apr 16, 2018

@msimerson Good to merge?

(Though aren't there CPAN modules maintaining that list?)

@msimerson
Copy link
Member

@msimerson Good to merge?

No objections from me. Although the TLD list is almost certainly out of date. I'd approve merging it simply based on: it's better than what's there currently.

Copy link
Member

@msimerson msimerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better than current.

@mufus
Copy link
Contributor Author

mufus commented May 23, 2018

One thing I've noticed is a risk of false-positive on html-emails when stylesheet classes are defined.

<style> div.class1.class2 { } </style>

If the domain "class1.class2" are blacklisted, the above will trigger false-positive. This have always been the case, but now with all the new TLD's, there is a greater chance.

We should exclude searching texts within <style></style> tags.

@msimerson msimerson merged commit e21860f into smtpd:master Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants