This is a new distribution based on a plugin developed by Christian Holler decoder_at_own-hero_dot_net
Require:
Plugin officially requires SA 3.1.4 or higher
New Perl Modules
DB_File
Storable
MLDBM
Previous
String::Approx
Removed:
Option: 'focr_pre314'
Not used as it now requires SA 3.1.4
Added:
Option: 'focr_path_bin'
Its value is treated as path for searching of @bin_utils, potentially
requiring less configuration options;
Directories in the path that don't exists, are skipped;
Default value: /usr/local/netpbm/bin:/usr/local/bin:/usr/bin
Option: 'focr_db_hash'
Its value holds the filename to use for storing hash database; See below.
Default value: /etc/mail/spamassassin/FuzzyOcr.db
Option: 'focr_db_safe'
Its value holds the filename to use for storing hash database; See below.
Default value: /etc/mail/spamassassin/FuzzyOcr.safe.db
Option: 'focr_db_max_days'
Its value holds the filename to use for storing hash database; See below.
Default value: 35
Option: 'focr_keep_bad_images'
If this is set to 1, then this plugin will not remove the temporary image
directory created where the images are stored and processed if it
determines that the image was corrupt, or an error occurred with any
of the auxiliary programs that process the images. Usefull while
debugging.
Default value: 0
Changed:
Option: 'focr_logfile'
Defaults to 'stderr' so that logging goes there
Option: 'focr_enable_image_hashing' if set to 2:
Use MLDBM to store Hash info in true DB file for faster access.
Stores hashes of images that exceed set thresholds in file
specified by option focr_db_hash
Stores hashes of 'clean' images (without matching words)
specified by option focr_db_safe to also cache good images.
Keeps statistics of Hash-Hits and displays #times matched in log.
Saves name of attachment and content/type as reference
Automatically imports known-hashes from focr_digest_db into focr_db_hash
Automatically expire 'old' records if not matched in more than
the number of days specified in option 'focr_db_max_days'
Instead of having a 'global' timeout, the 'focr_timeout' is used per
external program used, this will ensure that there are no timeouts
recorded because of complex scansets, or because of temporary spikes
in load. Also, it now displays the name and return code information
for the binary that timedout, making it easier to debug problems.
Fixed:
A bug where option focr_counts_required was not recognized;
Logging to file when option 'focr_logfile' set now works;
Individual word scores are now applied correctly
Storing only images with matched words to hash database (Thanks to Robert LeBlanc)
Explicitly use Mail::SpamAssassin::Timeout (Thanks Eric Yiu)
Ignores empty lines in wordlists (global and local)
Ignores comments starting with (#) to EOL
Require:
Plugin officially requires SA 3.1.1 or higher
Added:
Support for BMP/TIFF Images
Changed:
Major internal restructuring
Use SpamAssassin Logging Facility instead of own logfile
Fixed:
A bug related to database hashing
Updated: Sep 11, 2006