|
URN:NBN:fi-fe20051953
Copyright © 1999- by Prof.
Timo Salmi
Last modified Sat 13-May-2023 17:34:17
|
|
|
|
|
Timo's procmail tips and recipes
|
Although there already is an abundance of procmail material on the
net, here are some of my own tips and observations. This tips page
is a companion of my Foiling Spam with an Email Password System
page. The items on this page are in no particular order. There is
some overlap in the items. Many of the links in the material have
expired along with the former server. This information is
practically just archived. No support, no feedback. I have moved on
to other things since my retirement in 2011.
- I want to filter my email automatically. How do
I get started with procmail?
- Building a testbench. How can I test
individual procmail recipes?
- I know how to make "and" rules in procmail
recipes, but how do I make "or" rules?
- How can one perform multiple shell commands
on the action line?
- How can I find out what the subject of a
posting is?
- How do I get a copy of the headers of all the
incoming email into a separate file?
- Would you give some further hints for spam
foiling recipes?
- I have limited disk space. How can I
truncate long messages?
- How can I quickly test if my rules with
regular expressions match?
- How can I detect if the email comes, say, from
the .com domain?
- What alternatives do I have to detect a sender
all through the various header-fields?
- How can I extract a valid address from the
Reply-To field?
- How can I extract the address of the
sender's postmaster?
- How can I weed out an inordinately long
recipient list?
- What is this procmail scoring? How can I
utilize it?
- How can I test if the subject is empty or if
the subject field is missing altogether?
- How can I modify the "To:" field of the email
I received?
- I have a long list of spammers in a separate
file. How can I utilize it?
- How do I forward certain messages that I get,
and preserve myself a copy?
- How do I forward certain messages to two
different addresses?
- How do I automatically return certain email
messages?
- My address has changed. How do I forward a
copy to myself and tell the sender?
- How can I set variable values based on the
text in the body of the email message?
- How can I insert some token text in front of
the body of incoming email?
- Do you have any useful tips for regular
expression matching?
- How can I test if two procmail variables have
the same contents?
- I am having difficulties with "<". How
does one match it?
- How can I insert identification text to the
beginning of the subject line?
- I tried out your tips, but some of them failed
on my system. What next?
- Is there a cure for the echo and grep blues?
- How do I know which of my many procmail recipes
has been enacted?
- How can I detect Korean, Cyrillic, or Chinese
to avoid such frequent spam?
- How can I change the subject line and include
part of the message body into it?
- How can I remove the signature from the
incoming email?
- What unix manuals relating to procmail should
I get?
- Is it possible to use procmail to call the
vacation program?
- How can I avoid duplicate messages sent
in rapid succession?
- How can I skip logging a certain, matched
recipe?
- Could you please solve for me this procmail
problem of mine?
- I liked this material. Do you have anything else
on programming?
- Exercises
- Acknowledgements for useful advice and/or
feedback
Web pages or any other reproduction:
This page is copyrighted ©. No part of this material, nor its
index, nor the entire contents may be reproduced (in any language)
on any other World Wide Web pages or in any other electronic,
physical or similar manner.
Quoting: However, you are free to quote
brief passages from this page or to post links to
the items in your messages and Usenet news postings
provided you clearly indicate the source.
Asking for programming
advice: Please see the item
Could you please
solve for me this procmail problem of mine?
Submitting: Contributions on the net are
acknowledged
further down on this page.
Note, however, that submissions of outside items to this collection
are not invited. This basically is my own collection, not an open
repository.
Disclaimer: The author shall not be
liable to the user for any direct, indirect or consequential loss
arising from the use of, or inability to use, any information, rule,
script, program or file, howsoever caused. No warranty is given that
the information, rules, scripts, programs or the advice given will
work under all circumstances or that they are current. You use
everything at your own risk.
I want to filter my email automatically.
How do I get started with procmail?
Unix email can conveniently be preprocessed with automatic filters
such as procmail, the "Autonomous mail processor". This item repeats
what already is presented about getting started in many of the other
FAQs, including mine on
spamfoiling. Nevertheless, this is so crucial that I'll try to
give the essential outline also here.
Find out what your email directory is. Go ("cd") to the directory where your email
folders are located and type "pwd". Assume in this item that you get
"/home/myid/Mail". Further
assume in the example that "/home/myid" is your home directory so
that you can use the variable "${HOME}" to denote it.
Find out where your system's Bourne shell is located by typing
"which sh". Assume that you
get "/usr/bin/sh".
Prepare a "~/.procmailrc" file
with a suitable editor. For example you might use "emacs ~/.procmailrc". To start
with, put something like this into the ~/.procmailrc file:
#Preliminaries
SHELL=/usr/bin/sh #Use the Bourne shell (check your path!)
MAILDIR=${HOME}/Mail #First check what your mail directory is!
LOGFILE=${MAILDIR}/procmail.log
LOG="--- Logging ${LOGFILE} for ${LOGNAME}, "
#Whatever recipes you'll use
#The order of the recipes is significant
:0
* ^From: scam@cyberspam\.com
/dev/null
# Accept all the rest to your default mailbox
:0:
${DEFAULT}
For the "~/.procmailrc" file a read permission for the user
him/herself will be sufficient. To ensure, give the command "chmod u+r ~/.procmailrc".
Find out where the "procmail"
program is located on your system by typing "which procmail". Assume below
that you get "/usr/local/bin/procmail". Also check
what your id is: "whoami".
Assume that you get "myid".
Next comes the crucial step. Put the following line in your "~/.forward" file. Include the quotes
(") into the ~/.forward file contents.
"|IFS=' ' && exec /usr/local/bin/procmail || exit 75 #myid"
Set adequate permissions for accessing the "~/.forward" file: "chmod 644 ~/.forward".
Lastly, check ("ls -lFd ~/") that your main
directory permissions are at least (the equivalent of) "drwx--s--x". If not, "chmod u+rwx ~/" and "chmod og+x ~/".
You should now be set to go. To check, send an email to yourself to
see if it gets through. If there is a problem see the advice on troubleshooting.
How can I test individual procmail recipes?
I do not wish to disturb my regular ~/.procmailrc recipes file in
the process.
There are several options. One method is building a simple test
environment as follows. It is a very convenient method. If you apply
it right, it allows the testing without affecting your normal flow
of email in any way. Create the following "proctest" file, preferably at your
path. Make it executable using "chmod u+x proctest". Thus
you'll have a new command "proctest" available.
#!/bin/sh
#The executable file named "proctest"
#
# You need a test directory.
TESTDIR=/home/myid/test
if [ ! -d ${TESTDIR} ] ; then
echo "Directory ${TESTDIR} does not exist; First create it"
exit 0
fi
#
#Feed an email message to procmail. Apply proctest.rc recipes file.
#First prepare a mail.msg email file which you wish to use for the
#testing.
procmail ${TESTDIR}/proctest.rc < mail.msg
#
#Show the results.
less ${TESTDIR}/Proctest.log
clear
less ${TESTDIR}/Proctest.mail
#
#Clean up.
rm -i ${TESTDIR}/Proctest.log
rm -i ${TESTDIR}/Proctest.mail
The beauty of this method is that besides "proctest.rc" you can easily edit also
"mail.msg" for testing different
kinds of incoming mail and the behavior of your recipes in various
situations. Note, however, that it is best to test only for one
email message at a time. In other words, do not put more than
one email message into the mail.msg test file.
A question remains. Where does one get the structure of a posting
for the "mail.msg" test posting? Easy. Invoke elm, select a suitable, existing
posting, and make a copy of it to "mail.msg" by pressing C (capital
C) and reply mail.msg to "Copy message to:". Other mail programs
probably have similar options.
Below is the proctest.rc recipe file which I used in preparing for
this item:
SHELL=/bin/sh
TESTDIR=/home/myid/test
MAILDIR=${TESTDIR}
LOGFILE=${TESTDIR}/Proctest.log
LOG="--- Logging for ${LOGNAME}, "
#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all
#Let's test stripping lines from the email message's header
:0 fwh
| egrep -vi "(^Content-|^MIME-Version:.)"
#If it is from myself, store the email message
:0:
* $ ^From:.*${LOGNAME}
${TESTDIR}/Proctest.mail
#Otherwise, discard the email message
:0
/dev/null
Feedback:
The header stripping does not work if any of those header lines
is continued. It is almost always an error to use grep/egrep/fgrep
when filtering a message header. A better recipe would be the
following, utilizing formail:
#Let's test stripping lines from the email message's header,
#but only when they're there
:0 fwh
* ^(Mime-Version:|Content-)
| formail -IMime-Version: -IContent-
To continue myself. The flags are as follows: "f" use the pipe as a
filter, "w" execute before proceeding, "h" it is about the header of
the email message.
The formail -I switch means that if the field is found it is to be
replaced with a similar field with and "Old-" prefix, provided that
the field is not empty (if it is empty the field is removed).
I know how to make "and" rules in procmail
recipes, but how do I make "or" rules?
Just in case, let's first revisit an "and" rule by a common example:
#Trivial catching of potential spam towards the end of a ~/.procmailrc
#Place only after accepting all the mailing lists you want to receive
:0:
* ! ^TO_ts@([-a-z0-9_]+\.)*uvasa\.fi
* ! ^TO_timo\.salmi
${HOME}/.mail/PotentialSpam.mail
For entering an "or" rule, consider the following example:
#Accept email from Era Eriksson, the author of the major procmail FAQ
:0:
* ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\
^From:.*era@iki\.fi
${DEFAULT}
Let's look at a few details:
- The "^TO_" in the first recipe is a procmail reserved predefined
special expression "which should catch all destination
specifications containing a specific address." It must be written in
upper case.
- The "!" in the first recipe is the familiar operator indicating
a negation.
- If "${HOME}/.mail" is your mail directory you don't need to
spell out the entire path "${HOME}/.mail/PotentialSpam.mail". Just
"PotentialSpam.mail" will be sufficient.
- The first detail of the "or" example is complicated and is per
se unrelated to the "or" issue at hand. The "([-a-z0-9_]+\.)*"
expression in "reriksso@([-a-z0-9_]+\.)*helsinki\.fi" sees to it
that if Era has several machines in his domain (as I do under mine),
all will be matched by the recipe. The "[-a-z0-9_]" matches any of
the characters within the brackets "[]", the trailing "+" tells that
there must be at least one repeat of those characters, the "\."
matches a dot, and the "*" tells that there has to be zero or more
repeats if the preceding expression within the parentheses "()".
[This item owes heavily to Era's friendly guidance.]
- The backslash "\" in "helsinki\.fi" sees to it that the actual
dot (.) is matched. This is because if the "quote next character"
"\" is omitted, the "." is taken as a regular expression matching
any (exactly one) character.
- The "|" in the "|\" indicates an "or" condition, and the "\"
quotes the embedded end of line, i.e. tells that the rule is
continued on the next line.
- The "|" or condition sees to it that the recipe matches email
coming from Era either from the "helsinki.fi" or the "iki.fi"
domain.
- The "${DEFAULT}" puts the email in the regular mailbox.
- The trailing ":" in the recipe start line ":0:" tells procmail
to use temporary file locking to avoid writing simultaneously
arriving potential email on top of each other at your "${DEFAULT}"
mailbox. Since no lock file name is given after the ":0:", procmail
will provide the lockfile name. Always use this format when
delivering to a mail folder, unless the target folder is /dev/null.
That is, unless you want the email is discarded.
There are alternatives. Scoring could be used
for the same purpose
:0:
* 1^0 ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi
* 1^0 ^From:.*era@iki\.fi
${DEFAULT}
Likewise, you could alternatively use ( ) grouping
:0:
* ^From:.*(\
reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\
era@iki\.fi)
${DEFAULT}
Feedback:
That condition looks a bit ugly to me. Let me rephrase it to show
you what I mean:
* ^From:.*(reriksso@([-a-z0-9]+\.)*helsinki|era@iki)\.fi
(an underscore can not be part of a hostname, as far as I
know.)
Yes, many of the rules presented in this FAQ can be written more
concisely and/or effectively. The rules, as presented in the FAQ,
are often formulated for easier understanding than efficiency. But
it is useful to improve on the efficiency after one first has got
the basic logic of a rule outlined.
How can one perform multiple shell commands
on the action line?
See the action line below (i.e. the one starting with the "|" pipe).
Separate the commands with "&&". If you wish to continue on
a second line for readability, apply "\" Alternatively, just one
long line could have been used. The recipe below is from a test with
the testbench. Its purpose is just to show
this method of giving multiple commands.
#Test if the message has a "Subject:" header and has a subject in it
#(The brackets [] contain a space and a tab)
:0:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* ^Subject:[ ]*\/[^ ].*
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
echo "${MATCH}" >> ${TESTDIR}/Proctest.mail
Likewise, a single command can be subdivided for easier
documentation:
| echo "A ^Subject: header found but there is no subject" \
>> ${TESTDIR}/Proctest.mail
Below is another example with a slightly different
syntax using the semicolon ";" as the separator. The example also
demonstrates how to save diskspace by zipping email from a
particular source. You'll need Info-ZIP's zip and unzip in order to
be able to apply it. (They are available from the proper
Unix section of Garbo
program archives at the University of Vaasa,
Finland.)
:0w:Test.mail.lock
* ^From:.*test
| unzip ${HOME}/mail/Test.zip; \
cat >> Test.mail; \
zip -oj9 ${HOME}/mail/Test.zip Test.mail; \
rm -f Test.mail
What happens on the action line is this:
- The potentially existing "Test.zip" zip-file is unzipped to
obtain the earlier email messages that already might be within
Test.zip.
- The incoming email is appended to the extracted Test.mail
file.
- The updated Test.mail file is compressed back into the
Test.zip zip-file.
- The uncompressed Test.mail is deleted.
To be on the safe side procmail is told to wait (the "w" flag in
":0w:Test.mail.lock") until the pipe ("|") has been performed.
How can I find out what the subject of a
posting is?
Now is a good time to utilize my testbench
in order to find out if a logic works. Build a
/home/myid/test/proctest.rc file.
SHELL=/bin/sh
TESTDIR=/home/myid/test
MAILDIR=${TESTDIR}
LOGFILE=${TESTDIR}/Proctest.log
LOG="--- Logging for ${LOGNAME}, "
First, a few environment variables are included.
#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all
The above means: Use full reporting for the debugging.
#An auxiliary regular expression to detect text,
#The brackets [] contain a space and a tab
GETTEXT="[ ]*\/[^ ].*"
If the same expression is used several times in a recipe file, it is
convenient to put the expression into an environment variable
instead of writing it out repeatedly.
- The first part "[ ]*" of the regular expression matches any
number of spaces and tabs (even the case of none) which can lead the
subject.
- The "\/" is a special procmail-only operand which puts a
(possible) match found by the rest of the expression into a variable
named MATCH.
- "[^ ]" means all other characters but the one's within the
brackets. The ".*" means that a match of non-tab, non-space characters
is sought for.
#Test if the message has a "Subject:" header and has a subject in it
:0c:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* $ ^Subject:${GETTEXT}
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
echo "${MATCH}" >> ${TESTDIR}/Proctest.mail
- The "c" flag in ":0wc" tells that the processing should
continue also after this particular recipe has been acted upon.
(When the "c" flag is not present, the all the rest of the recipes
in proctest.rc are all skipped.) The "w" tells to wait until the
"|" pipe has finished.
- The ":${TESTDIR}/Proctest.mail.lock" tells which lockfile to
use in order to avoid the confusion from the possibility of
simultaneous arrival of several email messages. Note that since we
use a pipe "|" in the actions part, it is prudent to explicitly give
the name of the lock.
- Note the first "$" on the "$ ^Subject:${GETTEXT}"
condition line. It tells that the environment variables (in this
case "GETTEXT") on the line are to be expanded, not to be taken as
literal text.
#Test if the message has a "Subject:" header but has no subject in it
:0c:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* $ !^Subject:${GETTEXT}
| echo "A ^Subject: header found but there is no subject" \
>> ${TESTDIR}/Proctest.mail
#Test if the message has a "Subject:" at all
:0c:${TESTDIR}/Proctest.mail.lock
* !^Subject:
| echo "No ^Subject: header was found" >> ${TESTDIR}/Proctest.mail
#Otherwise, discard the message
:0
/dev/null
After the recipes above have been testbenched and cleared, you know
that the methods used in them will work for you in your own
environment.
Of course, there are other options for extracting the subject into
an environment variable. One is to utilize "formail" which is a
companion to the procmail program. If you include the following
expression at the beginning of your ~/.procmailrc recipes file, you
will have the variable ${SUBJECT} available for the rest of the
recipes file.
#Environment variables for procmail
#
#Get the subject
#Discard some dangerous special chars + any leading and trailing blanks
SUBJECT=`formail -xSubject: \
| tr '\;\`\\' ' ' \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
For an example of usage see the
Foiling Spam with an Email Password System page.
Feedback:
Extracting the header from inside procmail using the \/ token is
_much_ faster than the formail solution.
Feedback:
If the SUBJECT variable is left empty, apply quotes on the first
line, i.e.
SUBJECT=`formail -x"Subject: "\
How do I get a copy of the headers of all
the incoming email into a separate file?
You can use
#Header logging
:0hc:${HOME}/.mail/Procmail.head.lock
| cat >> ${HOME}/.mail/Procmail.head
- The "h" flag in ":0hc" tells that the header should be
accessed.
- The "c" flag in ":0hc" orders the processing to continue also
after this recipe. In other words, you put your other recipes, after
the header-catching, in the ordinary fashion. The email will reach
them.
- The ":${HOME}/.mail/Procmail.head.lock" tells which particular
lockfile to use.
- Since there are no condition lines (lines starting with *)
this item will always be acted upon when it is reached. You wanted
to log the headers from all the incoming email, right?
- The "| cat >> ${HOME}/.mail/Procmail.head" appends
the headers to the ${HOME}/.mail/Procmail.head file.
Feedback:
Since appending to a file is the result of a normal mailbox
delivery, that can be more efficiently written as simply:
:0 hc:
$HOME/headers.cut
That eliminates a cat and a shell process, plus the pipe and
extra reads and writes.
Now, if you want to overwrite the file with each new message [or do
some further shell operations within the pipe], then the cat command
is a reasonable choice.
[A further point] That would have been an odd name for the lockfile.
Why not $HOME/headers.cut.$LOCKEXT?
Would you give some further hints for spam
foiling recipes?
Besides what is on my page Foiling Spam with
an Email Password System and a separate item on detecting the sender, below are some instructive
little tricks.
Perhaps the strongest generic trick against spam is to shirk any
email that is not addressed to you directly, since most spam is
addressed to some kind of mailing lists. Of course, you first will
have to accept email from any legitimate mailing list which you have
subscribed to. If you put a suitable recipe after your recipes that
accept the legitimate email lists much of the incoming spam will be
caught. Below is a simplified and a bit munged) version of what I do
in my own ~/.procmailrc:
#Catch potential spam
:0
* !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi
{
:0 fwh
* ^Content-Length:
| formail -IContent-Length:
:0:
Spam.mail
}
If you look carefully through this page, you'll find explanations
for all the details in the above recipe. It will be a good exercise
to do so. :-)
Since so much, if not practically all spam comes from forged sender
addresses it is much more effective to block certain suspect email
routes than to try to match the elusive spammers. The scoring recipe example below treats as spam all
email that is routed via dialsprint.net and that is not addressed to
"me" personally.
#Spam avoidance of certain routes and if not for me personally
:0:
* -1^0
* 1^0 ? formail -x"Received:" | egrep -is "dialsprint\.net"
* 1^0 ! ^TO_(myid|myFirstName[ _\.]myLastName)@([-a-z0-9_]+\.)*myhost\.mydom
Spam.mail
- The "?" at the start of the condition executes and evaluates
what is on the condition line instead of searching for a literal
match.
- Procmail's companion program formail is used to extract all
the "Received:" routing information from the posting's header. Then
"dialsprint.net" is sought for using Unix egrep via the "|" pipe.
- This is a sideline, but the simpler, less general form of the
last condition line would, of course, be just "* 1^0 ! ^TO_myid@myhost\.mydom"
- The scoring system is explained
elsewhere on this page, but in brief the score is initialized at -1.
Each explicit condition is given a weight of 1. If the total score
is at least 1 (i.e. positive) then the action (storing to the
Spam.mail file) is initiated.
Fairly often there is a tell-tale exhortation to email to a remove@
or a removeme@ address within the actual message. As you may know,
these are just common ploys of the spammers to get your address
confirmed to make matters even worse for you.
:0B:
* (remove@|removeme@)
PotentialSpam.mail
- The "B" flag tells the recipe to search through the body of
the email message.
- Note the "or" testing on the conditions line.
- Note again the file locking (the trailing : in ":0B:"). Since
the email message is directed to a folder, we do not need explicitly
to name the lockfile. We can let procmail do it. As a default it
will use the name PotentialSpam.mail.lock
- The "B" means the body and only the body of the message. The
header is not included. However, I have as hearsay that some
procmail versions have a bug in this respect, but I have not been
able to test that situation myself.
The subject line of the allegedly more respectable [sic] unsolicited
advertising has an "ADV" marker in upper case on the subject line.
(For an imaginary legitimacy such spammers occasionally attach some
xenophobic quibble about U.S legislation, not very relevant on the
international Internet.)
:0D:
* ^Subject:.*ADV
PotentialSpam.mail
- The "D" flag tells to distinguish between the lower and the
upper case in testing for a match.
There are some obvious code words that tend to appear on the subject
line, such as "make money fast" and "$$$".
:0:
* (^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)
PotentialSpam.mail
- Note, not "^Subject:.*$$$", but "^Subject:.*\$\$\$" because,
if not quoted with "\", a "$" is taken as a regular expression
indicating the end of line.
- Other typical subjects which you might wish to
catch include such as
- cable descrambler
- FOR SALE
- laser printer toner
- million email addresses
- ONLY $
- Quit Your Job
- Other typical contents include such as
- absolutely no obligation
- call now 24 h
- to be taken off our list
Don't overdo it, though, lest you end up weeding also some
legitimate email.
Feedback:
The regexp:
(remove@|removeme@)
is much slower than
remove(me)?@
Having the 'top-level' of the regexp be an alternation (via '|')
slows down matching by quite a bit. The more that can be factored
out at the beginning of the regexp, the better. The same goes for
the recipe that matches against the Subject: header-field:
^Subject:.*(make.*money.*fast|\$\$\$)
is faster than:
(^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)
My comment: Of course it is commendable to be efficient, especially
where easy understanding is not compromised. However, if the two
clash, I often prefer clarity of expression and convenience over a
strict maximization of code efficiency. Don't we have our powerful
modern computers to perform our tasks for us, not vice versa :-).
(This is not about the particular feedback above. The improvements
are useful. They are both legible and instructive.)
More feedback:
The "* ^Subject:.*ADV" rule is overly simplistic and catches
many non-spam subjects. Maybe rather something like
"* ^Subject:\<*ADV\>"
My comment: Ok. Let's try
:0D:
#(The brackets [] start with a space and a tab)
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail
It is far from perfect, but it should work reasonably well for regular
purposes. Spam detection requires experimenting anyway. Regular
expressions are not easy. They are quite a large subject area of their
own.
The above assumes that there is (at least) one space after the
"Subject:" header before the subject begins. This can be ensured by
first applying "formail -z" which you can have high up your
~/.procmailrc. For example I have the upper two lines in mine.
:0 fwh
| formail -z -iContent-Length:
:0D:
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail
See the other items in this tips file for an explanation of the
"fwh" flags. The formail program with the "-z" switch will insert
the desired blanks into the header. The "-iContent-Length:" switch
(which is outside the theme of the current item) will replace the
Content-Length: headers with Old-Content-Length: headers.
I use a slightly different recipe in my own ~/.procmailrc recipes
file:
:0D
* ^Subject:.*([ ]|<|\[)ADV([ ]|>|:|\]|$)
{
:0
{ RULE="Catch potential spam by detecting an ADV keyword" }
:0
/dev/null
}
If you wonder about the "RULE" variable, see the item about logging which rules have been used.
On to a different facet. Some ISPs (Internet Service Providers) do
now allow numbers in the email addresses. Thus, you may identify
some of the forged spam by catching a violation in this respect. The
following recipe catches email with numbers in the user id before
the @ mark from all the various nodes on "respectable.net".
:0:
* ^From:.*[0-9].+@([-a-z0-9_]+\.)*respectable.net
PotentialSpam.mail
Date: Thu, 19 Dec 2002 10:44:44 +1000
From: Philip Gunter
To: Timo Salmi
Subject: A procmail tidbit
Hi Timo, thanks for your excellent procmail reference.
Here is a small recipe you might like to add to your site.
It limits the number of emails being forwarded from an account,
useful to stop sms storms.
Cheers,
Philip.
:0
{
:0
{
# remove any sms-alert files older than 5 minutes
GLOP_=`find /var/tmp/sms -name sms-alert\* -cmin +5 -exec rm -f {} \;`
# Create an sms-alert file for this message.
GLOP_=`touch /var/tmp/sms/sms-alert$$`
# Count the number of sms-alert files
COUNT=`ls /var/tmp/sms | grep sms-alert | wc -l`
COUNT1=`expr ${COUNT}`
# Check if number of alerts in the last 5 minutes is less than 2
ISLT=`expr ${COUNT1} \< 2`
}
:0:
# if the expression is true then forward the email
* ISLT ?? ^^1^^
! 0123456789@pager.net
}
I have limited disk space. How can I
truncate long messages?
Before we proceed any further, there is a very
important email feature to observe. If you alter the
content-length of a message it is highly advisable first to discard
any "Content-Length:" lines from the email's header. If you fail to
do that, there is the danger that next time you read the relevant
email folder your email program will break your folder because of
erroneous length information. Many email programs are brain-dead
that way.
#Truncate messages longer than 4000 bytes to 100 lines
:0
* > 4000
{
:0 fwh
* ^Content-Length:
| formail -IContent-Length:
:0:Truncated.mail.lock
| head -100 >> Truncated.mail
}
Some details:
- The "* > 4000" matches email messages longer than
4000 bytes.
- The already familiar set of flags "fwh" tells to treat the
email's header.
- Use formail to ensure removing even complicated
"Content-Length:" lines.
- The above also serves as an example of "block nesting", i.e.
the rules and actions between the braces "{ }".
Let's expand the recipe a bit.
#Truncate messages longer than 4000 bytes to 100 + 10 lines
:0
* > 4000
{
:0 fwh
* ^Content-Length:
| formail -IContent-Length:
:0c:Truncated.mail.lock
| head -100 >> Truncated.mail &&\
echo "-:-:-:- (snip) -:-:-:-" >> Truncated.mail
:0:Truncated.mail.lock
| tail +101 | tail -10 >> Truncated.mail
}
A few observations:
- The first 100 lines are included. So are the last 10.
- The above also exemplifies giving multiple commands. Recall that
a standard recipe only allows one action line.
Another option is to compress the incoming email
instead of truncating it.
How can I quickly test if my rules with
regular expressions match? The fuller procmail testbench is a bit heavy a machinery for quick
testing.
Let's see. A lite version of the testbench could be the following.
Put the rules you wish to try out in a "greptest" file of your rules
with egrep since procmail matching closely (but not quite!) follows
egrep's. Make the file executable with "chmod u+x greptest". Then
make a "mail.msg" file with the
texts you wish to try to match (or not to match). Thus you might
have:
#The executable file named "greptest"
#!/bin/sh
egrep -i '(ts|timo\.salmi)@([-a-z0-9_]+\.)*uvasa\.fi' mail.msg
#
#Allow a quick visual comparison on the screen
echo ""
cat mail.msg
#The mail.msg target file with the trial text for the matching
ts@uvasa.Fi
ts@loisto.uvasa.fi
Timo.Salmi@uvasa.Fi
Timo.Salmi
null@uvasa.fi
Then, just give the command "greptest" and visually compare the
outputs.
Miscellaneous notes:
- There are some special differences between procmail extended
matching rules and the egrep expressions. Thus under special
circumstances they do not match the regular expressions quite the
same way. This might raise occasional confusion. See "man procmailrc" for the
details.
- You can also test egrep regular expressions on your PC since
egrep clones are available from the Garbo
program archives. For example you might try gnuegrep.zip,
egrep.zip
and dgrep.zip.
Even better, get the entire, very useful Windows 32-bit UNIX-clone
sets UnxUtils.zip and UnxUpdates.zip
How can I detect if the email comes, say,
from the .com domain?
I have been baffling over this item myself, because it is not as
trivial as it first appears. The catch is that the ".com" is exactly
at the end of the address. The problem naturally is that in the
email headers there can be text after the email address, such as the
sender's name. E.g.
From: scam@cyberspam.com (The Big Bad Spammer)
The first solution that comes to mind is the following, but it is
not entirely accurate.
:0:
* ^From:.*\.com
* !^From:.*\.com\.
* !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi
ProbableComSpam.mail
- The first condition line matches a ".com" anywhere on the
"From:" address line. It would match, for example, email from
"someone@my.company.net".
- The second condition line tries narrow the condition down, but
it still would match e.g. "someone@my.ispcom.net". (Or would it?
Anyway, the recipe is not quite accurate.)
- The third condition line is just standard spam avoidance, not
necessarily related to the task at hand. It is just that much, if
not the majority of spam appears to involve .com addresses.
Quite possibly there are better solutions, but below is what I came
up with for hopefully an accurate match:
# Get the sender's address
# Discard any leading and trailing whitespaces
FROMADDR_=`formail -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Test if the email came from the .com domain
:0:
* $ ? echo ${FROMADDR_} | egrep -is '\.com$'
ComDomain.mail
- Let formail take care of finding out from the headers what the
sender's address is. Get rid of any leading and/or trailing white
spaces using "expand" for tabs and "sed" for the remaining spaces.
You should have this definition high up in your ~/.procmailrc
- The "$" on the condition line tells to expand any variables on
the line. In this case the "${FROMADDR_}" instead of taking in
literally.
- As far as I understand, the "?" executes a line (and tells to
transmit an exit code, but that is beside the current point). BTW,
if you have the procmail extended diagnostics on ("VERBOSE=yes") you
can get in your procmail logfile a sinister looking "Program failure (1)". Don't panic. It
just is egrep's exit code telling that no match was found for that
particular email message, i.e. that it was not from the ".com"
domain.
- The condition line echoes the stripped email address to
"egrep" in order to test if there is a match. The "-i" switch is
used since email addresses are case insensitive. The essence of the
"egrep" is the trailing "$" matching the end of the extracted
address. The "-s" switch tells egrep to work silently, i.e. only to
give the return code.
There is one small convenience in the first, inaccurate recipe
version. It is easy to include several domains into the same recipe.
For example:
:0:
* ^From:.*\.hk|\
^From:.*\.kr|\
^From:.*\.tr
* !^From:.*\.hk\.|\
!^From:.*\.kr\.|\
!^From:.*\.tr\.
* !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail
An aside: You could also utilize a more condensed format:
* ^From:.*\.(hk|kr|tr)
(Condensing the rest of the above recipe is left as an exercise.)
Using scoring is one option. The recipe could
also be rewritten as
#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
#Whatever other recipes in between.
#Spam screening of certain susceptible domains
:0:
* -1^0
* 1^0 $ ? echo ${FROM_} | egrep -is '\.hk$'
* 1^0 $ ? echo ${FROM_} | egrep -is '\.kr$'
* 1^0 $ ? echo ${FROM_} | egrep -is '\.tr$'
* 1^0 !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail
There also is the option
:0:
* ^From:.*\.hk([ >]|$)|\
^From:.*\.kr([ >]|$)|\
^From:.*\.tr([ >]|$)
* ! ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail
What alternatives do I have to detect a
sender all through the various header-fields?
If we only look at the "From:" field in the header we have the
familiar:
#Accept all email from myself, weed out autoreplies
:0:
* ^From:.*myid@([-a-z0-9_]+\.)*myhost\.mydom
* ! ^X-Loop: myid@myhost\.mydom
${DEFAULT}
Next, let's extend the matching to more fields in the header:
:0
* ? formail -x"From" -x"From:" -x"Reply-To:" -x"Errors-To:"\
| egrep -i "scam@cyberspam\.com"
/dev/null
- The "?" at the start of the condition executes and evaluates
what is on the condition line instead of searching for a literal
match.
- Use formail to extract from the headers.
- The "-x" switch means extract the contents a header-field from
the header. Formail is convenient (also) because it can concatenate
the potential continuation lines in a header-field.
- Pipe the results to "egrep" regular expression search. The
"-i" switch tells egrep to ignore the lower/uppercase status of the
target string.
- Incidentally: Since we discard the email message to
"/dev/null", file locking ":0:"
must not be used.
We can utilize a predefined expression to match the header-fields.
The clever "FROM" expression below comes from Jari Aalto's
procmail material.
FROM="^(From[ ]|(Old-|X-)?(Resent-)?(From|Reply-To|Sender):)(.*\<)?"
#(whatever else in between)
:0
* $ ${FROM}scam@cyberspam\.com
/dev/null
- The first "$" on the condition line tells that the environment
variable(s) on the line are to be expanded, instead of taking all
the text on the condition line literally.
You may go even further in your detective work and include the
information from the header's "Received:" lines. That is, you also
can detect if something what you wish to avoid is along the route
where the email came from.
:0
* ? formail -x"Received:"\
| egrep -i "cyberspam\.com"
/dev/null
Spam email is sometimes indicated by a missing or an empty "From:"
line in the header. Furthermore, the "From:" line might contain an
empty <> instead of having a proper address within the
<>. Using scoring we might have
something like
:0:
* 1^0 ^From:([ ]$|$)
* 1^0 ! ^From:
#A catch: Don't use here the word-boundary operators \< \>
#Use just the plain <>
* 1^0 ^From:.*<>
NoFrom.mail
Under a worst-case scenario, the various sender headers might all be
empty. To test for this unlikely eventuality we can utilize the fact
that formail would put a "foo@bar" into the "FROM_" under such
circumstances.
# Define getting the sender's address
# Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Test if the sender could not be identified at all
:0:
* FROM_ ?? foo@bar
NoSender.mail
As always, there are several alternatives to solving a problem.
Consider a potential case where a spammer poses as the mailer-daemon
but the "From:" header is either missing or total gibberish. How to
detect this situation? The second condition in the recipe below
ensures that there is "From:" line in the header, and that it has
some elementary validity.
:0:
* ^From[ ]*MAILER-DAEMON
* ! ? formail -x"From:" | egrep -is "[a-z]"
ProbableSpam.mail
- The first condition is to check the first From line in the
header.
- The [] contains a space and a tab.
- In the second condition the "!" is the familiar operator
indicating a negation.
- The "?" tells to execute and evaluate what is on the condition
line instead of searching for a literal match.
- formail's -x"From:"
extracts the From: header
contents (without the field name).
- Unix egrep is used to test whether the "From:" field exists
and contains at least one ordinary letter, upper or lower case
("i"), working silently ("s").
How can I extract a valid address from the
Reply-To field, and that field only?
One trick is to utilize the following variable definition letting
formail do the worrying about the proper address format.
REPLYTO_=`egrep "^Reply-To:" | head -1 \
| formail -c -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
- Assume that indeed you strictly want the address from a
"Reply-To:" header. No address in any other header will do. Use
egrep to extract the "Reply-To:"
header-field from the incoming email.
- head -1 ensures that only
the first occurrence of a "Reply-To:" in the message counts.
- formail -c -rt -xTo: is a
standard, special trick to form a return address. The key is the
-r switch which "generates an
autoreply header". The "-c" switch concatenates any continued fields
in the header.
- If no "Reply-To:" header is found in the email message, foo@bar will be returned as the
address.
- The last line removes any leading and trailing tabs and
blanks from the address.
If you put the REPLYTO_ definition high up in your ~/.procmailrc you
will have the variable available to the rest of your recipes.
Feedback:
Let me suggest this:
- REPLYTO_=`formail -cXReply-To: | head -1 | formail -rtzxTo:`
- "formail -cX" rather than "egrep" in case the header has a
different capitalization -- or if the real address is on a
continuation line.
- formail "-z" flag to avoid "expand" and "sed".
Timo's further comments:
- The "-c" switch concatenates continuation lines.
- The "-X" switch extracts the header-field, preserving the
field name.
- The "-rt", "-x" and "To:" trick prepare a return address.
- The "-z" switch ensures that a whitespace exists between field
name and content.
- If the Reply-To: header-field is empty or missing, the value
of the REPLYTO_ variable will be foo@bar
How can I extract the address of the
sender's postmaster?
Put these definitions high up in your ~/.procmailrc :
#Get the sender's address, the generic version
FROM_=`formail -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
#Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`
#Build the postmaster's address
FMAST_="postmaster@${FHOST_}"
Thus, you have the postmaster's alleged address available as
${FMAST_} from this point on in your recipes file. Note, however,
that all validity testing of the address is missing.
What happens in the FROM_ formula:
- At e quick glance it may appear that the "From:" header and
the "To:" header have been confused in the formula, but this is not
the case. The formail program is asked to ("-r") to prepare a reply
header to send email back to the sender. Then that return address is
extracted. That is why we have a "-xTo:" since we want to extract
where the reply would be sent. That is where we assume that the
email came from.
- In the pipe "expand" is used to replace potential tabs with
spaces, and "sed" is used to omit any leading and trailing white
spaces.
Formail uses a certain priority order in preparing the reply header.
If there is a "Reply-To:" field in the header, the "FROM_" variable
will contain that address. In same cases one may wish to ignore that
field for example to prevent malicious relaying. Here is the how:
#Get the sender's address, ignore Reply-To:
FROM_=`formail -I"Reply-To:" -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
How can I weed out an inordinately long
recipient list? I am one of the recipients of a very useful
professional mailing list, but it lists in its "To: " header-field
all the recipients to the list. Furthermore, it repeats the messages
in HTML format. The text format is sufficient for
me.
The (only slightly modified) example below is based on a true
situation from my own ~/.procmailrc.
#Ensure a whitespace exists between field name and content
#Comment "Old-" the Content-Length field from all the headers
:0 fwh
| formail -z -i"Content-Length:"
#(whatever else in between)
:0
* From:.*the-mailing-list-maintainer
* ^TO_the@first\.recipient\.edu
{
:0 fw
| formail -I"To:" -I"X-" -I"Content-Type:" -I"MIME-Version:"\
-A "To: Maintainer's long recipient list suppressed" \
| sed -e '/^This is a multi-part /,/^Content-Transfer-Encoding: /d' \
-e '/------=_NextPart_/,$d'
:0:
${DEFAULT}
}
- There are two condition lines.
- Match if it is from the mailing list maintainer.
- Match if it is for the full mailing list and not only to me
personally from the maintainer.
- Feed ("f") the email message to a pipe of several lines. Tell
procmail to wait ("w") for the pipe to finish.
- Let formail weed out superfluous fields.
- Append a very brief "To:"-field for your information.
- Let sed take out any special format information.
- Let sed weed from the start of the HTML part to the end of
the message
- This example shows the principles, but it is based on the
established format of the postings on the particular mailing list.
Therefore it is not applicable as such, but you'll have to customize
and test it for your own situation. (See the items on test methods
on this page.)
What is this procmail scoring? How can I
utilize it?
This is a somewhat complicated subject with material dispersed
throughout the various procmail FAQs. Basically scoring is a method
to count how many of the conditions are fulfilled in a recipe and if
the "score" is positive, that is the score is 1 or more, the action
line in the recipe will be performed. There is much, much more to
scoring, but this is a good starting point.
Consider the following simple spam foiling recipe. It will put the
email into the ProbableSpam.mail file if the score adds up to at
least to one. If the first condition is met, 1 is added to the
score. Ditto for the second condition. Thus if either of the
tell-tale spam signals occur, the score will be positive (that is
greater than zero) and the action (storing the email message into
the ProbableSpam.mail file) will be enacted.
:0:
* 1^0 ^Subject:.*make money fast
* 1^0 ^Subject:.*\$\$\$
ProbableSpam.mail
The example above uses equally-weighted scoring. One can also have
unequal scores. Below, a hit of the second condition gives two
points while a hit of the first only gives one.
* 1^0 ^Subject:.*make money fast
* 2^0 ^Subject:.*\$\$\$
Scoring can be used to build some extremely trivial artificial
intelligence into the recipes. Consider the following
:0:
* -1^0
* 1^0 ^Subject:.*money
* 1^0 ^Subject:.*fast
* 1^0 ^Subject:.*\$\$\$
ProbableSpam.mail
- The initial score is set at -1. Thus at least two of
the subsequent conditions have to be met in order for the entire
recipe to match. If none or only one of the conditions is met, the
score will not rise above zero.
An alternative formulation of scoring to foil spam is given below.
This time it is required that at least three of the score-condition
lines match. (The [] contain a space and a tab, as usual.)
:0:
* ^Subject:[ ]*\/[^ ].*
* -2^0
* 1^0 MATCH ?? ()\<easy\>
* 1^0 MATCH ?? ()\<fast\>
* 1^0 MATCH ?? ()\<(cash|money)\>
* 1^0 MATCH ?? \$\$\$
ProbableSpam.mail
- procmail \/ operand is used to extract the subject of the
email into the reserved MATCH variable.
- Variables testing "??" is used.
- Word matching is used applying the word boundaries "\<".
Thus "fast" would be matched, but not "faster".
- If both the words "cash" and "money" appear on the subject
line no more than one score point will be awarded.
Further, simple examples
#Catch potential spam by examining the email route
:0:
* 1^0 ? formail -x"Received:" | egrep -i "157\.161\.140\.2"
* 1^0 ? formail -x"Received:" | egrep -i "199\.217\.231\.46"
* 1^0 ? formail -x"Received:" | egrep -i "212\.106\.213\.36"
* 1^0 ? formail -x"Received:" | egrep -i "216\.154\.1\.82"
ProbableSpam.mail
- As usual, the "?" executes and evaluates what is on the rest of
the condition line instead of searching for a literal match. Note the
syntax order.
- Incidentally, there is a subtle catch in using the IP numbers.
Assume that you wish to detect the nodes from 216.154.1.74 through
to 216.154.1.86. This rule won't work quite right:
"216\.154\.1\.[74-86]". Why? The "[74-86]" will match 4-8. (The 7
and 6 would be superfluous since they already are within the 4-8
range.) The rule would find matches outside the intended range. E.g.
"216\.154\.1\.72" would be matched. Instead, applying both
"216\.154\.1\.7[4-9]" and "216\.154\.1\.8[0-6]" would match
correctly.
This 'precision' recipe checks in the message header both the
"From:" field and the "Received:" path of a forgery spam.
#Avoid a specific forgery spam
:0:
* -1^0
* 1^0 ^From:.*mikerobbins2000@hotmail\.com
* 1^0 ? formail -x"Received:" | egrep -is "psi\.net"
Spam.mail
Scoring and ordinary conditions can be mixed in the rules. For
example the two recipes below achieve roughly the same thing, but
the latter option produces less steps if the email is for you.
:0:
* -1^0
* 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net'
* 1^0 ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail
:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net'
ProbableSpam.mail
The formail switches in the above are
- -c Concatenate continued fields in the header.
- -x Get the contents of the said header-field. Do not include
the field name.
The fgrep (search a file for a fixed-character string) switches in
the above are
- -i Ignore upper/lower case distinction during comparisons.
- -s Silent (only produce error messages) in order to check
the return status without any output.
The above example could also be written more efficiently without
scoring as
:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* ^Received:.*(\
alladvantage\.com|\
ameritech\.net|\
bellatlantic\.net)
ProbableSpam.mail
How can I test if the subject is empty or
if the subject field is missing altogether?
Scoring seems to be the answer:
:0:
* 1^0 ^Subject:([ ]$|$)
* 1^0 !^Subject:
NoSubject.mail
As usual, the brackets [] contain a space and a tab.
There are other options to test for an empty "Subject:" or an
entirely missing "Subject:" field. The one below puts the subject
contents in a variable. The actual recipe then tests if the value of
the "SUBJ_" variable is empty. (Also see the feedback about the syntax.)
#Get the subject discarding any leading and trailing blanks
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
#Test for an empty or missing subject
:0:
* SUBJ_ ?? ^^^^
NoSubject.mail
- ^^^^ denotes empty contents. The trick is adopted from procmail material of some
other authors where the ^^ anchor is better explained than what I
can do. Also see procmarc.man for it.
- Likewise, see procmarc.man for the ??
definition.
How can I modify the "To:" field of the
email I received?
I am not exactly sure why you wish to do this, but here is how to
replace the "To:" header-field of a message using formail. Choose
the formail "-i" option to rename the old "To:" field to be
"old-To:" and to insert the new "To:" header-field. The flags in the
recipe are as follows: "f" use the pipe as a filter, "h" it is
about the header of the email message, "w" execute before proceeding
down the rest of the "~/.procmailrc".
:0 fhw
* To.*myoldid@myoldhost.myolddom
| formail -i "To: mynewid@mynewhost.mynewdom"
I have a long list of spammers and other
Internet lowlife in a separate file. How can I utilize
it?
The technique is fairly simple. Put this in your "~/.procmailrc" file:
MAILDIR=/home/myid/Mail #The location of your own mail directory
# Whatever other preliminaries
# Whatever other recipes
# Test if the email's sender is in the blacklisted
:0
* ? formail -x"From" -x"From:" -x"Sender:" \
-x"Reply-To:" -x"Return-Path:" -x"To:" \
| egrep -is -f black.lst
/dev/null
- All the common email sender headers are covered.
- Also the "To:" field is covered in the recipe, since spammers
often name their mailing lists as phony addresses.
- Continuation lines ("\") are utilized. Incidentally, ensure
that there are no trailing whitespaces after the "\" on a line.
- The "-i" option in egrep tells to ignore upper/lower case
distinction. The "-s" is for silence. The "-f file" option
tells to take the list of the regular expressions from file.
Prepare a "/home/myid/Mail/black.lst" file
with contents something like:
abc23@airnewz.ccn
abdu@advis.com.tr
adexec@mail.com
dinner@dine.com
friend@public.com
helpingyou@mail.com
mk1977@ms1.kingnet.com.tw
nb8MAMxhq@mail.com
no@body.com
owieuj@peterlink.ru
patkline00@usa.net
promotions@web-vertise.com
unknown@unknown.com
- The black.lst file should reside in your "${MAILDIR}" mail directory (unless you
explicitly include the path in your "~/.procmailrc").
- The problem with such lists is that most of the spam related
addresses are very transient by nature. I do not think such lists
alone are a very effective method, as I have explained in my Foiling Spam with an Email Password
System measures medley.
- For an exact matching you might wish to use e.g.
"no@body\.com" instead of "no@body.com". Alternatively, one could
use fgrep (fixed grep) or grep -F
How do I forward certain messages that I
get, and preserve myself a copy?
Below is an example:
#Get the sender's bare email address from the first "From" line
FROM_=`formail -c -x"From " \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \
| awk '{ print $1 }'`
#Get the original subject of the email
#Discard superfluous tabs and spaces
#On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
| expand \
| sed -e 's/ */ /g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
#Whatever other recipes you'll use
:0
* ^From:.*infolist@([-a-z0-9_]+\.)*infohost\.infodom
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
:0c: #Preserve a copy of the email
Infolist.mail
:0fwh #Adjust some headers before forwarding
| formail -A"X-Loop: myid@myhost.mydom" \
-A"X-From-Origin: ${FROM_}" \
-i"Subject: $SUBJ_ (fwd)"
# Forward the email
:0
!mydept@myhost.mydom
}
Another example, another method for forwarding:
SENDMAIL=/usr/sbin/sendmail
FROM_=`formail -c -I"Reply-To:" -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Make a copy of all email to my second address
:0
* ! ^X-Loop: myid@myhost\.mydom
{
:0c:${HOME}/procmail.lock
| formail -A"X-Loop: myid@myhost.mydom" \
-I"Subject: ${SUBJ_} [autofwd]" \
| ${SENDMAIL} -f"${FROM_}" my2ndId@my2ndHost.mydom
}
How do I forward certain messages to two
different addresses?
I have the following recipe in my
~/.procmailrc file, but the email does not get forwarded to the
myid2@myhost.mydom address.
:0 c
*^From.*info.gov
! friend@somehost.domain myid2@myhost.mydom
I am not sure what is wrong with that, but at least the solution
below should work:
:0
* ^From.*info.gov
* ! ^X-Loop: myid@myhost\.mydom
{
:0fwh
| formail -A"X-Loop: myid@myhost.mydom"
:0c
! friend@somehost.domain
:0
! myid2@myhost.mydom
}
The X-Loop is not relevant from the point of the stated problem, but
using it as a safeguard is always advisable.
Feedback:
The reason that the first one does not work is that the
recipients' addresses are separated by space while they should be
separated by a comma [as in]
:0
! friend@somehost.domain,myid2@myhost.mydom
(I have not tested this one.)
How do I automatically return certain email
messages?
Ah! Another potential case of spam avoidance? (This is a companion
page to Foiling Spam with an Email Password
System, remember.) Below is an example. But be sensible in using
the method, since most spam has forged senders.
#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
#Whatever other recipes in between.
#Return certain email
:0
#
# Is the email from a frequent spam domain?
# (Note: fgrep takes no regular expressions)
* ? formail -c -x"Received:" | fgrep -is 'cyperspam.com'
#
# Is it for a mailing list rather than to me?
* ! ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
#
# Avoid forgeries that pretend to be from my own site
* ! $ ? echo ${FROM_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FROM_} | fgrep -is '.'
* $ ? echo ${FROM_} | fgrep -is '@'
#
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
# Make a temporary file of the message to be returned
:0c:formail.lock
# Discard whitespaces, insert a leading blank
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
# Prepare and send the rejection
# Be sure to customize your sendmail path
:0:formail.lock
| (formail -r -I"Subject: Rejected mail: Recipient refusal" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
rm -f return.tmp) \
| /usr/lib/sendmail -t
}
- The spamfoiling page has a
further example.
- The "-r" option tells formail to generate an auto-reply
header.
There can be many variants of detecting and returning email which
one does not wish to get. Below is a fictitious example utilizing
variables to enhance the flexibility of the return address handling.
(If you are baffled by the "RULE" variable, which is just a sideline
here, see the item on identifying executed
recipes.)
:0
* ^From:.*(charpie|charpie5266)@mydeja\.com
{ REJECT="charpie5266@mydeja.com" }
:0
* ^From:.*umidextr@([-a-z0-9_]+\.)*mindfall\.com
{ REJECT="umidextr@mindfall.com" }
:0
* ^From:.*(rasch|Greg.*\.Rasch)@([-a-z0-9_]+\.)*millkirn\.com
{ REJECT="rasch@millkirn.com" }
:0
* ^From:.*(daren|Daren[_\.]Risenthal)@([-a-z0-9_]+\.)*slunet\.org
{ REJECT="daren@slunet.org" }
:0
* ! REJECT ?? ^^^^
{
:0
{ RULE="These users I do not want to talk with" }
:0cw
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
:0:procmail.lock
| (formail -r -I"To: ${REJECT}" \
-I"Subject: Rejected mail: Recipient refusal" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
rm -f return.tmp) \
| /usr/lib/sendmail -t
}
Note how the above set of rules has two parts, the actual detection
plus the return address definition, and the return action. The
latter could be written in many alternative ways, including
:0
* ! REJECT ?? ^^^^
{
:0cw
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
:0 fwh
| formail -r \
-A"Subject: Rejected mail: Recipient refusal" \
-A"From: myid@myhost.mydom" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
rm -f return.tmp
:0
! ${REJECT}
}
My address has changed. How do I forward a
copy to myself and tell the sender?
This is a theme whose constituents already are covered throughout
this material. But also take a look at "man procmailex" for the "vacation
database" idea even if a better name here would be something like
"dejatold database".
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
:0
# Was it to me
* ^TO_myoldid@myoldhost\.myolddom
# Ignore messages for daemons
* ! ^FROM_DAEMON
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
:0 c
! myid@myhost.mydom
:0:dejatold.lock
| formail -rD 8192 dejatold.cache
:0 eh
| (formail -r \
-A"X-Loop: myid@myhost.mydom" \
-I"Subject: Changed email address" ; \
echo "Dear Sender," ; \
echo "" ; \
echo "Thank you for your email about" ; \
echo "\"${SUBJ_}\"" ; \
echo "" ; \
echo "My email address has changed." ; \
echo "Old: myoldid@myoldhost.myolddom" ; \
echo "New: myid@myhost.mydom" ; \
echo "Your email has been forwarded to my new address." ) \
| /usr/lib/sendmail -oi -t
}
Some explanations:
- The "-r" switch prepares s reply header for sending email back
to the sender.
- The "-D maxlen idcache" switch in "-rD" controls the message
identification cache. For more see "man formail"
- The "c" flag in ":0 c" tells that the processing should
continue also after this particular recipe has been acted upon.
- The "e" flag in ":0 eh" decrees that recipe only executes if
the immediately preceding recipe failed
- The "h" flag in ":0 eh" tell to feed the header to the pipe.
Put since it is the default, it is not compulsory.
Naturally, the recipe does not stand alone in the ~/.procmailrc but
is a part of it. Thus you would e.g. have previous recipes that take
care of the email that is not to you, and email that was for mailer
daemons.
How can I set variable values based on the
text in the body of the email message?
Let's start with another, much simpler question:
From: ts(ät)uwasa(dot)fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Procmail: How do I filter by the body
Date: Sun Apr 23 09:34:38 EET DST 2000
X-Comment: Slightly modified
I am trying to save all the messages that
come to me with "mypassword" in the body to a folder called
password. How do I do that?
As the manuals state:
- Flags can be any of the following:
- B Egrep the body.
Hence, all there is to it is
:0 B:
* mypassword
password
If you want your password case sensitive then use
":0 BD:".
- All the best, Timo
From: ts(ät)uwasa(dot)fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Question of procmail newbie
Date: Tue Nov 23 23:09:41 EET 1999
X-Comment: Slightly modified
How could I solve the following problem with procmail: I receive
e-mails with a body like this:
- Category: aaa
- Subcategory: bbb
- File: ccc
I need to store this mail to the folder aaa/bbb/ccc, so procmail
should create directories aaa/bbb . What kind of .procmailrc should
I write?
The trick is to extract the appropriate text from the body of the
email message and to set procmail variable values on the basis of
the results. This is how it can be done.
#Preliminaries
SHELL=/usr/bin/sh #Use the Bourne shell (check your path!)
CATE=`cat | egrep "^Category:" | awk '{ print $2 }'`
SCAT=`cat | egrep "^Subcategory:" | awk '{ print $2 }'`
FILE=`cat | egrep "^File:" | awk '{ print $2 }'`
#Whatever other recipes
:0B:Procmail.lock
* ^Category:[ ].+[a-z0-9]
* ^Subcategory:[ ].+[a-z0-9]
* ^File:[ ].+[a-z0-9]
| mkdir ${CATE} ; mkdir ${CATE}/${SCAT} ;\
cat >> ${CATE}/${SCAT}/${FILE}
#Whatever other recipes
As a validity check the condition lines require that all the
key-lines are present in the email message body and that the lines
contain names.
- All the best, Timo
Feedback:
It would be much more efficient rewriting these definitions using
awk's pattern matching, such as:
CATE=`cat | awk '/^Category:/ { print $2 }'`
etc
Apropos awk. On the Usenet there are dedicated was newsgroups comp.lang.awk and alt.lang.awk. Furthermore, although
used in quite another connection than procmail, there are several
awk (actually GnuAWK) usage examples in my Assorted
NT/2000/XP/.. CMD.EXE Script Tricks collection.
Next, let's consider a trickier task. Find from the body of the text
the last line that potentially contains the string "mailto:".
Insert the contents of that line into a MAILTO_ variable.
:0
* ^Subject:.*Whatever
{
:0
{
MAILTO_=`sed -e '1,/^$/ d' \
| egrep "mailto:" \
| tail -1 \
| expand \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \
| sed -e 's/[^o]://g' -e 's/^://g' \
| awk -F: '{ print $2 }' | awk '{ print $1 }'`
}
:0:
WhichEverFolderYouWant
}
Consider the MAILTO_ construct. (The test of the recipe should be
self-explanatory.)
- The sed -e '1,/^$/ d'
extracts the body of the email message (i.e. the headers are
ignored).
- The egrep "mailto:" finds
all the lines containing mailto:.
- If there are several mailto: lines the tail -1 gets the last of
them.
- The expand expands any TAB
characters to SPACE characters.
- The sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'
omits any leading and trailing blanks.
- The sed -e 's/[^o]://g' -e 's/^://g'
weeds out from the same line the possible preceding colons (:) which
might cause confusion. It is not perfect, though.
- The awk -F: '{ print $2 }'
gets the rest (until the end of line or the next colon) after the
colon (:), i.e. the email
address from the mailto: line
and what may come after it. The awk '{ print $1 }'
discards the potential rest of the line starting with the first
blank after the address. What should thus be left is the email
address in the mailto: field.
Should you wish to get the entire line with the "mailto:" into the MAILTO_ variable
instead of just the email address there, simply leave out the last
two lines from the MAILTO_ definition.
How can I insert some token text in front
of the body of incoming email?
I have a really simple procmail question.
All I want to do is add a line
"======= Forwarded Mail
=========="
to the top of the body of all incoming messages, and
forward them to another account.
Let start by considering the first part of the question only. This
is how it is done. The solution owes heavily to Philip Guenther.
:0
{
:0 fhw
| cat - ; \
echo "===== Filtered email ====="
:0:
${DEFAULT}
}
So far so good. Next let's add the forwarding so that the token will
only appear in the forwarded message. (If you wish to change that,
adjust the order of the rules.)
:0
{
:0c:
${DEFAULT}
:0 fhw
| cat - ; \
echo "======= Forwarded Mail =========="
:0
!forward@myhost.mydom
}
Finally, let's add avoiding email loops.
# Discard loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null
:0
{
:0c:
${DEFAULT}
:0 fhw
| cat - ; \
echo "======= Forwarded Mail =========="
:0 fhw
| formail -A"X-Loop: myid@myhost.mydom"
:0
!forward@myhost.mydom
}
Do you have any useful tips for regular
expression matching?
This is a terribly complicated subject involving many features which
I do not know. Let's nevertheless look at some further example
recipes.
# Matching a few undelivery and such reports
:0:
* ^Subject:.*Undeliver(ed|able) (e)?mail|\
^Subject:.*Returned (spam )?(e)?mail
* ^TO_(myid|firstname\.lastname)@([-a-z0-9_]+\.)*myhost\.mydom
Returned.mail
Consider the first rule of the recipe above. It will match all email
with the following on the "Subject:" line in the header:
- Undelivered mail
- Undeliverable mail
- Undelivered email
- Undeliverable email
- Re: Undelivered mail
- etc...
The continuation line will match
- Returned mail
- Returned email
- Returned spam mail
- Returned spam email
- Re: Returned mail
- etc...
What if you don't want to match "Re: Undelivered mail"? The
following condition gives a more exact match
* ^Subject:[ ]+Undeliver(ed|able) (e)?mail
In other words only spaces and/or tabs are allowed between
"Subject:" and the start of the actual subject.
Let's consider another example. Say that we have two hosts
How to catch email from the former, but not the latter:
:0:
* ^From:.*cyber.com([^\.]|$)
ProbableSpam.mail
That is, do not allow a dot after the .com or alternatively require
that the line ends there. However, cyber.comet would be matched!
Thus, depending on what you want to achieve, you might have e.g.
:0:
* ^From:.*cyber.com( |"|>|$)
ProbableSpam.mail
What is the difference between the rules below?
* ^From:.*myid@([-a-z0-9_]+\.)*myhost.mydom
* ^From:.*myid@([-a-z0-9_]+\.)?myhost.mydom
* ^From:.*myid@([-a-z0-9_]+\.)+myhost.mydom
The first one matches any of
- myid@myhost.mydom
- myid@subhost1.myhost.mydom
- myid@mypc.subhost1.myhost.mydom
- The first one does not match
myid@.myhost.mydom
(and neither should it!).
- The second one matches 1 and 2, but not 3.
- The third one matches 2 and 3, but not 1.
To recount the purpose of the main special regexp symbols:
Symbol |
Interpretation |
*
|
Match zero or
more times
|
?
|
Match zero
or one times
|
+
|
Match one
or more times
|
.
|
Any character
|
[ ]
|
Match from the list within the brackets
|
^
|
The start of the line (within [] however, a negation)
|
$
|
The end of the line
|
\
|
Quote the next character to take it literally
|
( )
|
Grouping
|
How can I test if two procmail variables
have the same contents?
Basically the syntax for variable value tests is
VAR1_=Whichever expression you devise
:0:
* VAR1_ ?? regexp
wherever
But you can build rules like
VAR1_=Whichever expression you devise
VAR2_=whatever
:0:
* $ VAR1_ ?? ${VAR2_}
wherever
Note, however, that the above still is regular expression matching,
not an equality.
The blank after the first $ is significant. It tells that the
variable references on the line (${VAR2_}) are to be expanded, not
to be taken as a literal text.
Feedback:
That's easily resolved using $\var expansion and anchoring both ends
of the regexp:
* VAR1_ ?? $ ^^$\VAR2_^^
That condition will succeed if and only if VAR1_ and VAR2_ have
the same contents, with the possible exception of VAR1_ having one
more trailing newline than VAR2_.
I am having difficulties with "<". How
does one match it?
Date: 09 Dec 1999 23:06:41 -0600
From: Philip Guenther
Newsgroups: comp.mail.misc
Subject: Re: procmail, trivial html detection, and a quirk
ts(ät)uwasa(dot)fi (Timo Salmi) wrote:
> I just noted that, at least in procmail v3.13.1 1999/04/05
>
> :0B:
> * </body>
> * </html>
>
> does not work. Instead one has to apply
>
> :0B:
> * [<]/body>
> * [<]/html>
Yep. A leading '<' or '>' on a condition causes procmail to
interpret the condition as a size test. If you want a normal regexp
condition that starts by matching a literal '<' or '>'
character you have to protect the leading character from such
interpretation. There are several ways of doing so. The most
efficient are to use parens or a backslash:
- * ()</body>
- or
- * (<)/body>
- or
- * (</body>)
- or
- * \</body>
That last one is generally avoided because it looks like you're
using the \< regexp special when you really aren't. Putting the
'<' or '>' in brackets also works, as you did above, but it
slows down the matching ever so slightly as a character class is
slower to match than a single normal character. Thus, one of the
above four methods is usually preferred.
Philip Guenther
(Timo's addendum: As far as I understand \< is a word-boundary in
procmail. Hence \< is best avoided, when not used as an actual
boundary.)
How can I insert identification text to the
beginning of the subject line?
I know how to sort my incoming email with
procmail into different folders, but how do I use formail to
automatically add some suitable identification text to the subject
line of the email that I receive?
The general idea is this
#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
:0
* YourFirstSelectionCriterion
{
:0 fwh
| formail -I"Subject: WhateverYouAdd_1 ${SUBJ_}"
:0:
YourFirstFolder
}
:0
* YourSecondSelectionCriterion
{
:0 fwh
| formail -I"Subject: WhateverYouAdd_2 ${SUBJ_}"
:0:
YourSecondFolder
}
The flags are as follows: "f" use the pipe as a filter, "w" execute
before proceeding, "h" it is about the header of the email
message.
The -I option in formail removes and replaces the old header. Should
you wish to retain the old subject header with an "Old-" prefix
added, use -i instead.
I tried out your tips, but some of them
failed on my system. What next?
Here are a few ideas:
- Have you copied right? For example:
- If you cut and paste, the brackets [] containing tabs will
not be copied correctly, since on this page the assumed tabs
aren't true tabs.
- Make sure that you have not misinterpreted the meaning of
the quotation (") marks anywhere in the advice.
- If you have a backslash \ at the end of the line to continue
the line, it is very important to ensure that you do not have
white spaces after the \ backslash.
- Have you customized all your file-paths right? Some of
the recipes may require a slightly different setup in your
environment than assumed in this FAQ.
- Check that procmail is getting your proper path. Try "echo ${PATH}" and then include
"PATH=WhatYouGot" high up in your
~/.procmailrc recipe file.
- Include "VERBOSE=yes" high
up in your ~/.procmailrc recipe file. Then see what is in the
logfile procmail produced for debugging. The testbench is a useful aid in the
debugging.
- The shell you use may affect some actions. Check where your
Bourne shell sh is with "which sh". If it is e.g. /bin/sh
then include "SHELL=/bin/sh" at
the beginning ~/.procmailrc recipe file and see if anything changes.
Bourne shell is the shell I have used in preparing this tips
page.
- Work systematically. Try to pinpoint which particular line is
causing the offense and how. If the problem is with the condition
part make general enough a version to get it match. Then narrow it
down towards what you wanted until the recipe fails. If the problem
is with an action, try to separate whether the problem is with the
actual action or your procmail syntax. For example if you pipe the
email to a program, try to separate if it is the call syntax that is
in error (e.g. do you manage to convey the parameters right) or if
it the actual program you called that fails.
- If you have a procmail problem which you can't solve after
trying properly, post your problem to the comp.mail.misc Usenet newsgroup
and/or your corresponding local newsgroup. If you have genuine
feedback about my procmail tips, your email is most welcome, but please refrain from using email
for private consultation requests.
Echo and grep blues. I am having
difficulties with echo and grep usages in
procmail.
The combination of quoting and regular expressions can cause some
subtle problems when the Unix echo and one of the greps (grep, fgrep, egrep) is used in
the procmail recipes.
Consider
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject:`
# Responses to filter reports
:0:
* -1^0
* 1^0 $ ? echo \"${SUBJ_}\" | fgrep -is 'Re: Filter report'
* 1^0 ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
Response.mail
- In the example the email's subject header is put into a
"SUBJ_" variable utilizing formail "-x" option.
- The "-c" option is used to concatenate the potential
continuation lines, since occasionally the headers are divided onto
several lines. This is more common on the "Received:" line, but can
also occur on the "Subject:" line.
- If the quoted quotes (\") are not used in the echo, the
special characters on the email's Subject line in the header will be
processed as shell related operators. This must not be allowed,
since it will result in errors that may be hard to trace. For
example operators such as "(",
")", "`", "'", "<", ">" and "|" all have a special meaning to the
shell.
- It is safer to use fgrep (the fixed-character expression
search) because fgrep interprets also the regular expression special
characters literally. For example, for fgrep you could use fgrep 'myhost.mydom' instead of egrep "myhost\.mydom". BTW, as you gather from the
example above, procmail uses egrep-like syntax.
Consider a more complicated expression to extract the subject:
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
| expand | tr '\;\|\$\`\\]/' ' ' \
| sed -e 's/ */ /g' \
| sed -e 's/(/\\\(/g' -e 's/)/\\\)/g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
- The potential tabs are expanded.
- Some of the problem special characters are substituted with
spaces.
- Multiple spaces are substituted with a single space.
- Parentheses are covered with backslashes "\". Here things can
get really complicated, since the number of backslashes must be
compatible with the number of interpretation rounds through procmail
and the shell.
- The last sed gets rid of any leading and trailing
whitespaces.
There is much more to the echo and grep interactions with the shell
and the regular expressions. That is why sufficient trials using the
testbench are advisable before including
the more complicated recipes into one's "~/.procmailrc" file.
How do I know which of my many procmail
recipes has been enacted?
To get a log of what happens you set at the beginning of your ~/.procmailrc recipes file
SHELL=/usr/bin/sh # Use Bourne shell
MAILDIR=${HOME}/Mail # Customize as appropriate
LOGFILE=${MAILDIR}/procmail.log # Your procmail log
VERBOSE=yes # Produce full information
LOGABSTRACT=all # - " -
However, this produces so much information that it is not convenient
for a routine checking by a visual examination. But you can include
a suitable (dummy) variable definition in each one of your recipes
and then search the log file for occurrences of that variable. Here
is an example demonstrating how it goes. Consider a recipe that
originally is
# Discard probable spam mail, set 1
:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ^From:.*alladvantage.com
* 1^0 ^From:.*ameritech.net
* 1^0 ^From:.*bellatlantic.net
ProbableSpam.mail
Change this to be
:0
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ^From:.*alladvantage.com
* 1^0 ^From:.*ameritech.net
* 1^0 ^From:.*bellatlantic.net
{
:0
{ RULE="Discard probable spam mail, set 1" }
:0:
ProbableSpam.mail
}
Apply the same principle for all your recipes in your ~/.procmailrc file. Then, as email has
arrived, you can check which rules have been used by searching the
log file with the command grep "RULE=" ${HOME}/Mail/procmail.log.
If you need this regularly, make the grep search one of your Unix
scripts:
#!/usr/bin/sh
grep "Assigning \"RULE=" ${HOME}/Mail/procmail.log
In the altered procmail recipe, further up, carefully note some of
the syntax
- The location of the lockfile invocation ":".
- Above the RULE="..." line there is no cloning "c" flag in
":0" since setting a variable is a non-delivering action. The next
line will be reached anyway. In fact, it would be a mistake to use a
"c" there. It would lead to complications.
- In setting the RULE variable ensure that there is space after
the "{" and prior the "}". Otherwise the email will go to a folder
with rather a long and complicated name.
Procmail recipes nesting can get fairly complicated. Consider the
following example involving setting the RULE variable and procmail
else if conditions ":0E".
:0
* ^TO_my-mailing-list
{
:0
* ^From:.*@([-a-z0-9_]+\.)*myhost\.mydom
{
:0
{ RULE="To my-mailing-list, probably legitimate" }
:0:
${DEFAULT}
}
:0E
{
:0
{ RULE="To my-mailing-list, probably spam" }
:0:
Spam.mail
}
}
Feedback:
There is a method for logging which action took place without using the
VERBOSE yes which creates large log files. This method uses the LOG
variable:
LOGFILE=$HOME/.MailFilter_log
SHELL=/bin/sh
:0 B
* .*spam
{
LOG="TRAPPED SPAM - "
:0
/dev/null
}
#- Accept All other mail -#
:0
{
LOG="ACCEPTED MAIL - "
:0
$ORGMAIL
}
the out put looks something like this:
TRAPPED SPAM - From spammer@spam.com Thu May 16 03:52:42 2002
Subject: Make Money Fast
Folder:
/dev/null 43140
ACCEPTED MAIL - From goodguy@save.com Thu May 16 03:54:08 2002
Subject: Legitimate email message
Folder:
var/spool/mail/username 4683
My comment: If you look at the example for testing for individual procmail recipes you'll see that
for logging one sets (usually for troubleshooting)
#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all
For the method in the feedback above, leave those variables out or
set
VERBOSE=no
However, do not set
LOGABSTRACT=no
because then you'll miss all but the actual log variable
identification. Instead, just leave the line out.
How can I detect Korean, Cyrillic, or
Chinese to avoid such frequent spam?
There is a very good page by Walter Dnes
explaining the method. The method relies on ad-hoc approximation. In
brief, scoring is used to detect if more than 5 per cent of the
characters in the body of the message are high-bit characters
typical of the said language codes. If you have gone through the
items in my procmail FAQ, it should be easy to understand the
inventive method given on Walter's page. See the
exercise at the end of the current FAQ involving detecting
Korean.
If you wish to be even more reticent about what you wish to receive
you could even filter all messages that have any the
following combinations appearing anywhere in the body of the
message
àé
àì
áà
áò
áô
áù
áú
éè
éê
éî
éù
éú
and so on. Put those, and others you may wish to skip, in a
bl_body.lst file
# Probable spam mail, by message body
:0B
* $ ? fgrep -is -f bl_body.lst
/dev/null
How can I change the subject line and
include part of the message body into it?
I have a cellular phone. I want to save the
incoming email normally and also to send a modified copy to my
second account (a Short Message Service). The forwarded copy should
include the original subject AND five lines of the original message
text. The original body should not be included. Is this possible
with procmail?
Well yes, it is. It takes some figuring out needing many of the
principles presented in the other items in my proctips collection.
It also needs a few tricks with Bourne shell programming. Perhaps
most importantly, this item demonstrates how to put the body of the
message into a variable.
# Customize these paths if they do not match yours
SHELL=/usr/bin/sh
SENDMAIL=/usr/lib/sendmail
:0
* ^Subject:.*Timo testing
{
# Put the email intact in the default folder
:0c:
${DEFAULT}
# The "c" flag above tells the recipe to continue
# Now we prepare a different version of the message
:0
{
# Get the subject into a variable
# Expand the possible tabs into blanks
# Discard any leading and trailing blanks
# On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Get the body of the message into a variable
# Accept only the first five lines
# Discard newlines, i.e. put everything on one line
BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`
}
# Prepare and send a message with no body
# -X "" extracts just the header (discards the body)
# Plug in the new subject
# Content fields might cause problems if not discarded
# Change to To: address
:0:proc.lock
| formail -X "" \
-I"Subject: ${SUBJ_} ${BODY_}" \
-i"Content-Type:" \
-i"Content-Length:" \
-I"To: your@second.address" \
| ${SENDMAIL} -t
}
The line
BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`
retrieves the first five lines from the body of the text. It would
be more useful to retrieve a specified number of characters from it.
Say we wish to retrieve 160 characters. This is how to do that.
BODY_=`sed -e '1,/^$/ d' | tr -d '\n' | dd bs=1 count=160`
Solving the alternative of having a maximum of 160 characters in the
concatenated SUBJ_ and BODY_ is left as an exercise to the reader.
There also is another, more important improvement that can be made
in the action above. Replace tr -d '\n' with tr '\n' ' ' so that
when the lines are concatenated a space is put in between them.
How can I remove the signature from the
incoming email?
The recipe below assumes that the signature properly adheres to the
Internet "-- " convention to denote where the signature
starts.
:0
* ^Subject: Whatever
{
:0 fbw
| sed -e '/^-- /,$ d'
:0:
${DEFAULT}
}
Let's look at what we've got:
- The b flag means feed the
body to the pipe.
- The f flag means use the
pipe as a filter.
- The w flag means wait for
the filter or program to finish.
- This is not a sed FAQ, but
in brief:
- In the sed script the /^-- / matches the first
occurrence of the signature designator string "-- ".
- In sed, a lone $ stands
for the last line.
- The d denotes deleting
the "pattern space" found.
In the above the sed script will delete everything in the message
body starting from the "-- " until the end of the incoming
message. Substituting
sed -e '/^-- /,$ d'
with
sed -e '/^-- /,/^$/ d'
will instead delete everything starting from the "-- " until
the first encountered empty line. Thus if there is e.g. an
attachment after the signature, the attachment will not be thrown
away.
What unix manuals relating to procmail
should I get?
Unix manuals are not very helpful as starting points, but after you
have got the rudiments under your belt, you may wish to browse the
following manuals for additional information. Below is a simple
"manuals" Bourne shell script. It
prepares plain text format files of some of the essential Unix man
manuals for a procmail user, especially suited for offline
reading.
Note that the "^H" is not a "^" and an "H", but a CTRL-H, i.e. ASCII
8 (the backspace character). To make the "manuals" file executable
type "chmod u+x manuals".
#!/bin/sh
TODIR=${HOME}/myman
echo ${TODIR}
man egrep | sed -e 's/_^H//g' > ${TODIR}/egrep.man
man formail | sed -e 's/_^H//g' > ${TODIR}/formail.man
man procmail | sed -e 's/_^H//g' > ${TODIR}/procmail.man
man procmailex | sed -e 's/_^H//g' > ${TODIR}/procmaex.man
man procmailrc | sed -e 's/_^H//g' > ${TODIR}/procmarc.man
man regexp | sed -e 's/_^H//g' > ${TODIR}/regexp.man
man sendmail | sed -e 's/_^H//g' > ${TODIR}/sendmail.man
ls -lF ${TODIR}
Many of the recipes in this FAQ utilize sed
and/or awk. Some useful links (note, however, as is common with
links, I can't guarantee that they still are current):
Is it possible to use procmail to call the
vacation program?
Yes, it is, but it is not quite as straight-forward as one would
expect.
Since this is a procmail, not the vacation program advice collection
I'll assume that you are reasonably familiar with the vacation
program. If not, start with "man vacation". You have to use
procmail to customize the ~/.vacation.msg file because when
invoked via procmail, the vacation $SUBJECT variable is not
necessarily set.
Usually, when vacation is used, it is first called interactively to
crate the ~/.vacation.msg file
and to replace the ~/.forward
file. If you are going to use the procmail solution it is very
important not to do this. In particular, the ~/.forward file must not be touched in any way. The reason
is that in this solution it is used to invoke procmail, not
vacation. (The vacation program is, of course, called by procmail
now.)
# Set a number of variables high up in your ~/.procmailrc
#
VACATION=/usr/bin/vacation
ONVACAT=yes
VACFREQ=5d
VACMSG=${HOME}/.vacation.msg
MYNAME_="MyFirstName MyLastName"
MYEMAIL_=myid@myhost.mydom
# Get the subject discarding any leading and trailing blanks
# Note: On some systems -xSubject: has to be -x"Subject: "
#
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Prepare the vacation message's base
# This is done only once in ~/.procmailrc
#
:0 cwi
* ONVACAT ?? ^^yes^^
| echo "From: ${MYEMAIL_}" > ${VACMSG} ;\
echo "Subject: ${MYNAME_}, away from my mail" >> ${VACMSG} ;\
echo "X-Loop: myid@myhost.mydom" >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
echo "Thank you for your email about:" >> ${VACMSG} ;\
echo "\"$SUBJ_\"" >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
echo "Your email will be seen to when I return." >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
cat ${HOME}/.signature >> ${VACMSG}
# Here we go invoking vacation and also saving the email
# You might have several, different of these recipes
#
:0
* ^Subject:.*Whatever
{
:0
{ RULE="Testing" }
:0 cwi
* ONVACAT ?? ^^yes^^
* ! ^X-Loop:.*myid@myhost\.mydom
| ${VACATION} -t${VACFREQ} myid
:0:
WhateverFolder
}
Feedback:
Maybe I [Collin Park] can add one more comment: I think you need
a global LOCKFILE to cover the area from when you generate the
vacation message to the place where you invoke $VACATION.
Otherwise, message #N may generate .vacation.msg, then
message #N+1 overwrites it before #N invokes $VACATION.
How can I avoid duplicate messages sent
in rapid succession?
One, but not the only option is the following heuristics. You will
wish to customize and streamline it in accordance to your own
preferences.
#Some variables
FROM2_=`formail -c -I"Reply-To:" -rt -xTo: \
| tr '\;\|\$\`\\]/' ' ' \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
DFROM2_=`echo /${FROM2_}/ \
| expand | sed -e 's/[ \<\>\+\?\$]//g'`
SUBJ_=`formail -z -c -xSubject: \
| expand | tr '\;\|\$\`\\]/' ' ' \
| sed -e 's/ */ /g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
DSUBJ_=`echo /${SUBJ_}/ | expand | sed -e 's/[ \<\>\+\?\$]//g'`
DWC_=/`wc -w`/
#Discard doubles
# W Wait for the filter or program to finish,
# suppress any 'Program failure' message.
:0W
* $ ? sed -n 1p LastIn | egrep -is '${DFROM2_}'
* $ ? sed -n 2p LastIn | egrep -is '${DSUBJ_}'
* $ ? sed -n 3p LastIn | egrep -is '${DWC_}'
{
:0
{ RULE="Discard doubles" }
:0
/dev/null
}
#Store some information about the latest message
# c then continue
:0Wc
| echo "${DFROM2_}" > LastIn ;\
echo "${DSUBJ_}" >> LastIn ;\
echo "${DWC_}" >> LastIn
How can I skip logging a certain, matched
recipe? Say virus warnings from my postmaster.
The solution is rather simple. Direct LOGFILE to /dev/null (or
anywhere you may wish) for the duration of the relevant recipe. For
example
:
LOGFILE_=${LOGFILE}
LOGFILE=/dev/null
:0:
* ^Subject:.*Virus in a mail for you
* ^From:.*postmaster
VirusWarnings
LOGFILE=${LOGFILE_}
:
Alternatively you could likewise (re)set
VERBOSE=no
LOGABSTRACT=no
but the first solution is the more flexible.
Could you please solve for me this procmail
problem of mine?
It is nice that you have found my proctips so useful that you ask
for my personal advice. Nevertheless, if you ask me by email for
individualized procmail consultation my response has to be similar
to that as in asking me for any programming advice. Briefly, the
response is that I do not do email consultation. If you have a
procmail related problem please post your question to the Usenet
news to a newsgroup like comp.mail.misc.
The added advantage of posting is that in a newsgroup both the
question and the potential answers will have a wider forum. That way
everyone will benefit.
Please also be aware that I have retired in 2011. My interests now
lie elsewhere. It is not motivated enough for me to invest the
considerable effort required to look into other users' procmail or
other programming problems nor even partucularly maintain this
procmail information. It is currently presented "as is".
On rare occasions I have also been asked to email my own personal
~/.procmailrc or my own spamfoiling scripts. The answer is a
definite no. There are two main reasons. First, that material is
private. Second, I have neither the willingness nor the time to send
out material to users on individual requests. If and when I want to
share my material I make it available for the users to themselves
retrieve it via WWW or FTP.
I liked this material. Do you have anything
else on programming, etc?
Yes, notably this, even if old:
Some exercises
Let's see if we can put to work the methods presented in this FAQ to
solve some tasks, part of them having come up on the Usenet news.
Ex.1) Keep a copy of incoming email, and at
the same time, get only the first five lines from the message body
and forward it to another account.
# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null
:0
* Any rule(s) you might wish to have
{
# Keep a copy, but don't stop yet ( the c )
:0c:
${DEFAULT}
# Comment with "Old-" the Content-Length field from the header
# Ensure that a whitespace exists between field name and content
:0 fwh
* ^Content-Length:
| formail -z -i"Content-Length:"
# Add the loop avoidance
# ( f for piping; w for waiting for completion; h for headers )
:0 fwh
| formail -A"X-Loop: myid@myhost.mydom"
# Truncate the body ( the b ) to five lines
:0 fwb
| head -5
# Forward to the other account
:0
! myid2@myhost.mydom
}
It is important to handle the content-length header-field when the
length of the email is altered. This is done to ensure that the
receiving email program will not break the forwarded message when it
is read. The -i switch is used to retain the information about the
original message length to the attention of the receiver.
Ex.2) Forward the first 10 lines of the
message body to the user's second account while preserving all the
original message headers -- I.e. at the receiving side, the
user wants to see all the message travel history and only first 10
line of the message body.
This is a more complicated version of the first exercise. The
transformed task is not trivial, since when you forward, the
original message headers will be replaced by your forwarding
headers. Therefore, you'll have to see to preserving also the
original headers. Below is how I would solve the problem based on
several items in this FAQ.
# A trick to extract the subject into a variable
# Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# The actual recipe to solve the exercise starts here
:0
* Whatever condition(s) you wish to select the messages for forwarding
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
:0c: #If you want to, preserve a full copy of the email, else omit
${DEFAULT}
:0fwh #Preserve the information about the original content length
* ^Content-Length:
| formail -z -i"Content-Length:"
:0fwb #Truncate the body of the message to ten lines
| head -10
:0fwh #Insert a blank line at the beginning of the body for clarity
| cat - ; echo ""
:0fwh #Store the original headers, quoting them to avoid problems
| sed -e 's/^/\> /'
:0fwh #Insert some of your own information before forwarding
| formail -A"X-Loop: myid@myhost.mydom" \
-A"X-Info: Forwarded body truncated to 10 lines" \
-i"Subject: $SUBJ_ (fwd)"
#Finally, forward the adjusted email
:0
!my2dnId@myhost.mydom
}
# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null
Feedback:
The recipe with head probably needs an "i" on the flags line,
as:
- :0 fwbi
- | head -10
since write errors on the pipe are likely for messages larger
than a certain size. (I've seen numbers like 4096 and 10240... it
apparently varies with the system.)
Ex.3) Match a potential [TS999] identification
in the Subject header, such as "[TS001] Timo testing". If found,
insert a "Subject id: [TS999]" as the first line in the body of the
message. (The rest of the original subject line must not reappear in
the id.)
:0
* ^Subject:.*\/\[TS[0-9]+\]
{
:0 fhw
| cat - ; \
echo "Subject id: ${MATCH}"
:0:
${DEFAULT}
}
But what if you do want to include the rest of the original
subject line? In that case use
* ^Subject:.*\/\[TS[0-9]+\].*
Ex.4) Multi-part messages (which typically
include attachments) have in their headers a field like the two
examples below:
Content-Type: multipart/mixed; boundary=ELM965173874-25050-0_
Content-Type: multipart/mixed;
boundary="------------BA45271FBDAA479CECA7E20A"
Write a recipe that inserts into a variable (call it BOUND) the
boundary string. Note that the potential quotes (") are not to be
part of that string. Also note that the header might be divided on
multiple lines as in
Content-Type: multipart/mixed;
boundary=ELM965173874-25050-0_
There are alternative solutions, which not necessarily are quite
equivalent. The first one is putting high up in your ~/.procmailrc
recipe file the line(s)
BOUND1=`formail -z -x"Content-Type:" \
| awk -F= '{ print $2 }' \
| sed -e 's/\"//g' | tr -d '\n'`
A second one is:
:0h
* ^Content-Type:
{ BOUND2=`egrep -i 'boundary=' \
| awk -F= '{ print $2 }' | sed -e 's/\"//g'` }
This was not in the exercise, but you can then have recipes like
:0:
* ! BOUND2 ?? ^^^^
WhateverFolder
Ex.5) Identify if the
arriving email is in Korean. If so, return the message to the sender
and his/her postmaster. Ignore a potential Reply-To: field in the
header. Avoid email loops. Avoid forgeries which appear to come from
your own host. Avoid forgeries which lack a host name. Be careful
not to take Finnish/Swedish or French as
Korean.
This is quite a difficult exercise with many details involved.
# Get the sender's address, ignore Reply-To:
FROM_=`formail -c -I"Reply-To:" -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`
# Your path to sendmail
SENDMAIL="/usr/lib/sendmail"
# Reject probable Korean email using character scoring
:0
* ! ^X-Loop:.*myid@myhost\.mydom
* ! $ ? echo ${FHOST_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FHOST_} | fgrep -is '.'
{
:0BD
* -1^1 .
* 2^1 =[0-9A-F][0-9A-F]
* 20^1 [¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿]
* 20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
* 20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 =[89A-F][0-9A-F]
* -20^1 [åÅäÄöÖàáâçèéêë]
* -20^1 =(E5|C5|E4|C4|F6|D6|E0|E1|E2|E7|E8|E9|EA|EB)
{
:0
{ RULE="Probable Korean email" }
#
:0c:${HOME}/procmail.lock
| expand | sed -e 's/[ ]*$//g' \
| sed -e 's/^/ /' > ${HOME}/procmail.reject.korean
#
:0:${HOME}/procmail.lock
| (formail -r -I"Subject: Autorejected email" \
-I"To: ${FROM_}" \
-I"Cc: postmaster@${FHOST_}" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected probable Korean email ---" ; \
echo "" ; \
cat ${HOME}/procmail.reject.korean ; \
echo "--- end of rejected probable Korean email ---" ; \
rm -f ${HOME}/procmail.reject.korean) \
| ${SENDMAIL} -t
}
}
Ex.6) If the subject of the email contains the
identifier [INFO], in capitals, put the body of the incoming email into
a temporary file. Ensure that the name of the temporary file is unique.
Insert the full subject line at the top of the temporary file. (Why, and
what then is beyond this exercise.)
#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Assign a temporary file name
TMPFILE_=proctemp.$$
:0D
* ^Subject.*\[INFO\]
{
:0 fwbi
| echo "Subject: ${SUBJ_}" > ${TMPFILE_}; \
echo >> ${TMPFILE_}; \
cat >> ${TMPFILE_}
}
Ex.7) If the email comes from a certain
sender, check if the time-zone information is present in the Date
header. If not, add it assuming +3 hours.
#Get the date discarding any leading and trailing blanks
DATE_=`formail -xDate: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
:0
* ^From:.*TheCertainSender
* ! ^Date:.*(EET|DST|GMT)
{
:0 fwhi
| formail -i"Date: ${DATE_} +0300 (EET DST)"
:0:
${DEFAULT}
}
Ex.8) The simple spamfoiling recipe below
won't work. Correct it.
:0:
* !^TO$USER@xxxxxxx.xxx
ProbableSpam.mail
:0
{
:0
{ USER=`whoami` }
:0:
* $ ! ^TO_${USER}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail
}
The ([-a-z0-9_]+\.)* is optional.
Another solution:
:0:
* $ ! ^TO_${LOGNAME}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail
Ex.9) Insert at the beginning of the
subject the date/time of receiving the incoming message in the
YYYYMMDD HHMMSS format.
:0
* Whatever rules
{
:0
{ SUBJ_=`formail -c -xSubject: \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` }
:0
{ DATETIME_=`date "+%Y%m%d %k%M%S"` }
:0 fhwi
| formail -I"Subject: ${DATETIME_} ${SUBJ_}"
:0:
${DEFAULT}
}
Ex.10) This partly is based on an actual
incident. Consider the following recipe with three small, but crucial
syntax errors, and one omission. Find them.
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com\|
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
:0
{RULE="Abuse reception notes"}
:0
ReceivedNotes
}
The answer is a bit further down
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@\
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com|\
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
:0
{ RULE="Abuse reception notes" }
:0:
ReceivedNotes
}
Ex.11) Write a recipe to match the subject
line below. The (RECENT) may or may not be there, and the numbers
will change from posting to posting.
Subject: Re: [SpamCop:(RECENT)38.204.225.29,id:16135684] Make lotsof $$$
:0:
* ^Subject: Re: \[SpamCop:(\(RECENT\))?[0-9\.]+,id:[0-9]+\]
WhateverFolder
Ex.12) It is fairly common that spam email
has the same sender and recipient in the From: and To: fields.
Device a recipe that detects such postings.
This is not quite as simple as it first sounds, since it is
advisable to take into the account the fact that the contents of the
two fields may not be quite identical even in the case of the actual
addresses being the same. Thus I would use regular expression
matching both ways as below as one of the optional solutions. By
default, variable comparisons are regular expression matching, not
strict equalities. Also note avoiding email loops and falsely
targeting email which one may have sent to oneself.
WHOFROM=`formail -xFrom: \
| expand \
| sed -e 's/ */ /g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
WHOTO=`formail -xTo: \
| expand \
| sed -e 's/ */ /g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
:0:
* -100^0 ^X-Loop: myid@myhost\.mydom
* -100^0 ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
* -100^0 ^From:.*LegitimateMailingList
* 1^0 $ WHOFROM ?? ${WHOTO}
* 1^0 $ WHOTO ?? ${WHOFROM}
ProbableSpam.mail
Ex.13) Write a (spam avoidance) recipe to
detect email with more than seven recipients in the "To:" header
field. Assume for simplicity that each address will have exactly one
"@" character in it.
:0
* ^Subject:.*The information you requested
{
:0
{
WHOTO=`formail -z -xTo:`
COUNT=`echo ${WHOTO} | sed -e 's/[^@]//g' | wc -c`
COUNT1=`expr ${COUNT} - 1`
ISGT=`expr ${COUNT1} \> 7`
}
:0:
* ISGT ?? ^^1^^
ProbableSpam.mail
}
Ex.14) Make procmail forward email that
arrives between 9am and 5pm to a predefined daytime email
address.
:0
# Omit the condition line below if this is for all email
* ^Subject:.*Whatever
{
:0
{
TIME=`date +%H%M`
ISGT=`expr ${TIME} \> 0900`
ISLT=`expr ${TIME} \< 1700`
}
:0
* ISGT ?? ^^1^^
* ISLT ?? ^^1^^
! daytime_forward_address
}
Ex.15) Write a Procmail recipe which
detects if there is a Word document attached to the incoming email.
# Email with a Word document attached
:0
* ^Content-Type: multipart/
{
:0 B
* ^Content-.*attachment.*name=.*\.(doc|rtf)
{
:0
{ RULE="Email with a Word document attached" }
:0:
WordAttachmentEmail
}
}
Ex.16) Write a recipe to detect a
"whatever pattern" on exactly the second line of the body of an
incoming message. Ignore case in the
pattern.
:0B:
* ? sed -n 2p | egrep -is 'whatever pattern'
WhateverPatternMail
A tip: Even if there is no direct relation with procmail, my collection of useful
MS-DOS batch files and tricks contains several examples of the sed
(and awk) usages. So does my collection of
useful NT/2000/XP script tricks and tips.
Ex.17) Write a spam detection recipe that
does the following:
1. Check the body of the message against the keywords (collected
spam sites' www addresses etc.) in a BlackList.lst pattern-file. The
pattern-file might contain something like:
This letter may come to you as a surprise
Urgent business proposal
cheap-medz.com
discreetdelivery.net
http://homemarketplace.cjb.net
mailto:reklamapoezd@
quityourjobworkforus
statesmoneyz.com
www.badcrednp4u.biz
2. If a KEEPSPAM variable has been set to "yes" save the spam to
Spam.mail, truncated to 100 lines. If not, discard the
message.
# Probable spam mail, by message body
:0B
* $ ? fgrep -is -f BlackList.lst
{
:0
* KEEPSPAM ?? ^^yes^^
{
:0:MyProcmail.lock
| sed -n 1,100p >> Spam.mail
}
:0E
{
:0
/dev/null
}
}
Acknowledgements for useful advice and/or feedback:
Aughey, John
Bump, Jorey
Davey, David
Dnes, Walter
Eriksson, Era
Guenther, Philip
Guckes, Sven
Hebeisen, Christoph
Hirvonen, Hannu
Lane, John
Melish, Jacob
Menezes, Evandro
Novak, Curtis
Park, Collin
Pettigrew, John
van Tol, Ruud
Van Steenkist, Vernon
Any errors and inadequacies are, however, solely my own responsibility.
The legal note:
The author shall not be liable to the user for any direct, indirect
or consequential loss arising from the use of, or inability to use,
any information, rule, script, program or file, howsoever caused. No
warranty is given that the information, rules, scripts, programs or
the advice given will work under all circumstances or that they are
current. You use everything at your own risk.
C:\_G\WWW\~ELISANET\INFO\proctips.html
C:\_G\WWW\~ELISANET\FTPCMD\TSALMI.CMD /proctips
C:\_E\ARCZIP\COPYTONE.CMD /~info
http://www.salminet.fi/info/proctips.html
file:///c:/_g/www/~elisanet/info/proctips.html
[Revalidate]