From: UnixOS2 Archive To: "UnixOS2 Archive" Date: Sat, 27 Sep 2003 14:11:55 EST-10EDT,10,-1,0,7200,3,-1,0,7200,3600 Subject: [UnixOS2_Archive] No. 205 ************************************************** Friday 26 September 2003 Number 205 ************************************************** Subjects for today 1 Re: Checking for word in a string : Jack Troughton 2 Re: Checking for word in a string : Adrian Gschwend" 3 Re: Checking for word in a string : John Poltorak 4 Re: Checking for word in a string : Stefan Neis 5 Re: Checking for word in a string : Stefan Neis 6 Re: Checking for word in a string : John Poltorak 7 Re: Checking for word in a string : John Poltorak 8 Re: Checking for word in a string : Adrian Gschwend" 9 Re: Checking for word in a string : Dave Saville" 10 Re: Checking for word in a string : Dave Saville" 11 Re: Checking for word in a string : Dave Saville" 12 Re: Checking for word in a string : John Poltorak **= Email 1 ==========================** Date: Sat, 27 Sep 2003 12:01:03 -0400 From: Jack Troughton Subject: Re: Checking for word in a string John Poltorak wrote: > On Sat, Sep 27, 2003 at 01:50:25PM +0200, Adrian Gschwend wrote: > >>On Fri, 26 Sep 2003 20:43:43 +0100, John Poltorak wrote: >> >> >>>I've tried looking at bogofilter but my eyes quickly glaze over after a >>>few pages of the docs. It doesn't look very easy to set up at all. >>> >>>Is there any 'quick start' option for simply dealing with something like >>>Swen so that I can at least get something working and progress from there? >> >>Just to clarify, on netlabs.org weasel is passing the mails to >>bogofilter before they are put in the inbox and then bogofilter >>directly dumps them. So it *is* what you mean :-) > > It's not exactly what I wanted to do, but I guess I'll just have to accept > the limitations of having to process mail after it has been accepted by > the SMTP server. Not necessarily. I'm using weasel, and what I've got it doing is running bogofilter on the mail while the connection to the sending server is still open. After my wrapper script runs, I use the bogofilter return code to decide whether the mail was spam, if so, I return "3" to weasel at which point it sends a "554 rejected by server" to the connecting mail program. I'm sure that you can do something similar in your mail server. >>Bogofilter is quite straight forward to set up, you just need it to >>train with spam and nospam, Yuri provides two REXX scripts to do that. > > I don't understand how I'm supposed to use them. If you have email in mbox format (moz does this, for example, as does pine, elm, etc) you can just do this: separate out all your spam and put it in an mbox file (eg- junk.mbx) make a nice large collection of your normal mail and also put it in an mbox file (normal.mbx). Issue the two following commands: bogofilter -s < junk.mbx bogofilter -n < normal.mbx When it's done, there will be a .bogofilter folder in %home%, which will contain a database (wordlist.db). That's it, you're done training. It works a lot better if you have a LOT of email for it, of both varieties. I had about 30K normal mails, but only about 600 spams... but I've only had two false negatives since then. Here's my script: --------8<-------- /* bogofilter.cmd - for calling bogofilter from weasel */ path = value('PATH',,'OS2ENVIRONMENT') path = "d:\bogofilter\bin;" || path call value 'PATH', path, 'OS2ENVIRONMENT' parse arg message names address cmd 'bogofilter.exe -v -3 -l -p -I' message '-O out.txt >> bogofilter.log 2>&1 bogoret = rc select when bogoret = 0 then do address cmd 'copy out.txt .\spambox\' || date(s) || time(m) || '.msg' weaselret = 3 end when bogoret = 1 then do address cmd 'copy out.txt .\hambox\' || date(s) || time(m) || '.msg' weaselret = 0 end when bogoret = 2 then do address cmd 'copy out.txt .\jambox\' || date(s) || time(m) || '.msg' weaselret = 0 end otherwise weaselret = 0 end address cmd 'copy out.txt' message address cmd 'del out.txt' return weaselret -------->8-------- >>Then all the work which is left is to tell your mailserver to pass the >>mails to bogofilter first. That's more or less easy but it depends on >>your mailserver. I wouldn't want to fight with sendmail to be honest >>:-) > > Sendmail isn't too bad at all. > > I just got the third edition of the book a while ago, and wished I had all > the options available to me on OS/2. Well, I'm sure there's a way you can add in mail processing after the mail has arrived but before the mail has been accepted. When weaselret = 3, my mail server rejects the message. This is good because eventually my server will start to fall off their lists as a place to send mail since none of it will get anywhere. -- ------------------------------------------------------------------- * Jack Troughton jake at consultron.ca * * http://consultron.ca irc.ecomstation.ca * * Kingston Ontario Canada news://news.consultron.ca * ------------------------------------------------------------------- **= Email 2 ==========================** Date: Sat, 27 Sep 2003 13:50:25 +0200 (CDT) From: "Adrian Gschwend" Subject: Re: Checking for word in a string On Fri, 26 Sep 2003 20:43:43 +0100, John Poltorak wrote: >I've tried looking at bogofilter but my eyes quickly glaze over after a >few pages of the docs. It doesn't look very easy to set up at all. > >Is there any 'quick start' option for simply dealing with something like >Swen so that I can at least get something working and progress from there? Just to clarify, on netlabs.org weasel is passing the mails to bogofilter before they are put in the inbox and then bogofilter directly dumps them. So it *is* what you mean :-) Bogofilter is quite straight forward to set up, you just need it to train with spam and nospam, Yuri provides two REXX scripts to do that. Then all the work which is left is to tell your mailserver to pass the mails to bogofilter first. That's more or less easy but it depends on your mailserver. I wouldn't want to fight with sendmail to be honest :-) cu Adrian -- Adrian Gschwend at netlabs.org ktk [a t] netlabs.org ------- Free Software for OS/2 and eCS http://www.netlabs.org **= Email 3 ==========================** Date: Sat, 27 Sep 2003 14:26:57 +0100 From: John Poltorak Subject: Re: Checking for word in a string On Sat, Sep 27, 2003 at 02:59:09PM +0200, Stefan Neis wrote: > On Sat, 27 Sep 2003, Adrian Gschwend wrote: > > > Just to clarify, on netlabs.org weasel is passing the mails to > > bogofilter before they are put in the inbox and then bogofilter > > directly dumps them. So it *is* what you mean :-) > > As far as I followed the discussion, this still means that "sendmail" > (or an equivalent) is accepting the mail (but throwing it away > immediately), while John looks for something which would ideally make > "sendmail" interrupt the connection (or some similar nasty thing) based > on the subject line (so it's the spammer - or his provider - who has to > deal with some problems instead of him). Looks attractive though I have > some doubts about conformance of such behaviour with relevant RFC's. This is exactly how I understand that Milter works. It can be invoked directly as part of the mail transaction so that spam can be recognised and blocked at source instead of taking delivery of it and dumping it subsequently. If incoming mail can be recognised as spam after 10-20 lines, there doesn't seem to be much point in accepting 150k of it only to have to process and delete it later. > Regards, > Stefan > > Micro$oft is not an answer. It is a question. The answer is 'no'. > -- John **= Email 4 ==========================** Date: Sat, 27 Sep 2003 14:52:44 +0200 (CEST) From: Stefan Neis Subject: Re: Checking for word in a string On Sat, 27 Sep 2003, Adrian Gschwend wrote: > Bogofilter is quite straight forward to set up, you just need it to > train with spam and nospam, Yuri provides two REXX scripts to do that. > Then all the work which is left is to tell your mailserver to pass the > mails to bogofilter first. That's more or less easy but it depends on > your mailserver. I wouldn't want to fight with sendmail to be honest > :-) > > cu > > Adrian > > > -- > Adrian Gschwend > at netlabs.org > > ktk [a t] netlabs.org > ------- > Free Software for OS/2 and eCS > http://www.netlabs.org > > > Micro$oft is not an answer. It is a question. The answer is 'no'. **= Email 5 ==========================** Date: Sat, 27 Sep 2003 14:59:09 +0200 (CEST) From: Stefan Neis Subject: Re: Checking for word in a string On Sat, 27 Sep 2003, Adrian Gschwend wrote: > Just to clarify, on netlabs.org weasel is passing the mails to > bogofilter before they are put in the inbox and then bogofilter > directly dumps them. So it *is* what you mean :-) As far as I followed the discussion, this still means that "sendmail" (or an equivalent) is accepting the mail (but throwing it away immediately), while John looks for something which would ideally make "sendmail" interrupt the connection (or some similar nasty thing) based on the subject line (so it's the spammer - or his provider - who has to deal with some problems instead of him). Looks attractive though I have some doubts about conformance of such behaviour with relevant RFC's. Regards, Stefan Micro$oft is not an answer. It is a question. The answer is 'no'. **= Email 6 ==========================** Date: Sat, 27 Sep 2003 15:41:35 +0100 From: John Poltorak Subject: Re: Checking for word in a string On Sat, Sep 27, 2003 at 01:50:25PM +0200, Adrian Gschwend wrote: > On Fri, 26 Sep 2003 20:43:43 +0100, John Poltorak wrote: > > >I've tried looking at bogofilter but my eyes quickly glaze over after a > >few pages of the docs. It doesn't look very easy to set up at all. > > > >Is there any 'quick start' option for simply dealing with something like > >Swen so that I can at least get something working and progress from there? > > Just to clarify, on netlabs.org weasel is passing the mails to > bogofilter before they are put in the inbox and then bogofilter > directly dumps them. So it *is* what you mean :-) It's not exactly what I wanted to do, but I guess I'll just have to accept the limitations of having to process mail after it has been accepted by the SMTP server. > Bogofilter is quite straight forward to set up, you just need it to > train with spam and nospam, Yuri provides two REXX scripts to do that. I don't understand how I'm supposed to use them. > Then all the work which is left is to tell your mailserver to pass the > mails to bogofilter first. That's more or less easy but it depends on > your mailserver. I wouldn't want to fight with sendmail to be honest > :-) Sendmail isn't too bad at all. I just got the third edition of the book a while ago, and wished I had all the options available to me on OS/2. > cu > > Adrian > > > -- > Adrian Gschwend > at netlabs.org > > ktk [a t] netlabs.org > ------- > Free Software for OS/2 and eCS > http://www.netlabs.org > -- John **= Email 7 ==========================** Date: Sat, 27 Sep 2003 15:48:32 +0100 From: John Poltorak Subject: Re: Checking for word in a string On Sat, Sep 27, 2003 at 04:37:08PM +0200, Adrian Gschwend wrote: > On Sat, 27 Sep 2003 14:59:09 +0200 (CEST), Stefan Neis wrote: > > ah ok, then I have a very nice reading: > > http://www.benzedrine.cx/relaydb.html Have a look at Milter:- http://www.milter.org/ > Unfortunately not yet possible on OS/2. ditto :-(... > cu > > Adrian > > > -- > Adrian Gschwend > at netlabs.org > > ktk [a t] netlabs.org > ------- > Free Software for OS/2 and eCS > http://www.netlabs.org > -- John **= Email 8 ==========================** Date: Sat, 27 Sep 2003 16:37:08 +0200 (CDT) From: "Adrian Gschwend" Subject: Re: Checking for word in a string On Sat, 27 Sep 2003 14:59:09 +0200 (CEST), Stefan Neis wrote: >As far as I followed the discussion, this still means that "sendmail" >(or an equivalent) is accepting the mail (but throwing it away >immediately), while John looks for something which would ideally make >"sendmail" interrupt the connection (or some similar nasty thing) based >on the subject line (so it's the spammer - or his provider - who has to >deal with some problems instead of him). Looks attractive though I have >some doubts about conformance of such behaviour with relevant RFC's. ah ok, then I have a very nice reading: http://www.benzedrine.cx/relaydb.html It's from the guy who wrote the OpenBSD firewall, IMHO a very nice way to deal with spam. Unfortunately not yet possible on OS/2. cu Adrian -- Adrian Gschwend at netlabs.org ktk [a t] netlabs.org ------- Free Software for OS/2 and eCS http://www.netlabs.org **= Email 9 ==========================** Date: Sat, 27 Sep 2003 18:52:05 +0100 (BST) From: "Dave Saville" Subject: Re: Checking for word in a string On Sat, 27 Sep 2003 16:37:08 +0200 (CDT), Adrian Gschwend wrote: >http://www.benzedrine.cx/relaydb.html Interesting read :-) Just for fun I knocked up a perl script that will extract ip addresses inside [ ] on Received lines and ran it down my spam folder which has 1100 spams in it. It only came up with 74 different ones, one of which was 127.0.0.1 - I thought he claimed that bit could be relied on? The highest count was 5 and the total found 97. -- Regards Dave Saville **= Email 10 ==========================** Date: Sat, 27 Sep 2003 18:59:20 +0100 (BST) From: "Dave Saville" Subject: Re: Checking for word in a string Sorry - ignore previous message - I can't count :-) Questions still stand though. -- Regards Dave Saville **= Email 11 ==========================** Date: Sat, 27 Sep 2003 18:59:20 +0100 (BST) From: "Dave Saville" Subject: Re: Checking for word in a string Sorry - ignore previous message - I can't count :-) Questions still stand though. -- Regards Dave Saville **= Email 12 ==========================** Date: Sat, 27 Sep 2003 22:10:56 +0100 From: John Poltorak Subject: Re: Checking for word in a string On Sat, Sep 27, 2003 at 12:01:03PM -0400, Jack Troughton wrote: > That's it, you're done training. I must have done something wrong since I get different spamicity rating for the same msg when I run the same command and sometimes it is even 0.000... > -- > ------------------------------------------------------------------- > * Jack Troughton jake at consultron.ca * > * http://consultron.ca irc.ecomstation.ca * > * Kingston Ontario Canada news://news.consultron.ca * > ------------------------------------------------------------------- > > -- John