This is a new distribution based on a plugin developed by Christian Holler decoder_at_own-hero_dot_net
Fixed:
Properly initialized $h and $w to zero so that when getting the height and width
from an image, if the size parameters cannot be parsed, they can get properly
tested.
Hashing now works. $digest was getting reset because it went out of scope. grrr.
$efile was only being replaced for first occurrence in complex scansets generating
$efile: ambiguous redirect errors.
Various bugs where: Use of uninitialized values were reported.
Fixed:
Option: 'focr_db_safe'
This option was not included in the @pgm_options array.... oops (thanks UxBoD)
Score: wrongctype
This was not used correctly, thus it was not scoring... (thanks Eric)
Changed:
It now works with tempfiles only
This hopefully reducing the need to read/write image data from memory after each
'filter'. This will hopefully reduce IO and memory usage for the plugin.
Scanset Syntax: $pfile
Because of the use of tempfiles, there is a need to specify the image file to be
used as input. '$pfile' must be used to specify the input filename. Please note
that in cases where scansets use pipes, only specify $pfile as the input to the
first 'filter' program.
Scanset Syntax: $efile
With every scanset, stderr is redirected to '$efile', which is different for each
image. When using multiple filters in a scanset, use '$efile' to redirect stderr
to this file, making shure the plugin will correctly recognize an error when it
occurs.
Require:
Plugin officially requires SA 3.1.4 or higher
New Perl Modules
DB_File
Storable
MLDBM
Previous
String::Approx
Removed:
Option: 'focr_pre314'
Not used as it now requires SA 3.1.4
Added:
Option: 'focr_path_bin'
Its value is treated as path for searching of @bin_utils, potentially
requiring less configuration options;
Directories in the path that don't exists, are skipped;
Default value: /usr/local/netpbm/bin:/usr/local/bin:/usr/bin
Option: 'focr_db_hash'
Its value holds the filename to use for storing hash database; See below.
Default value: /etc/mail/spamassassin/FuzzyOcr.db
Option: 'focr_db_safe'
Its value holds the filename to use for storing hash database; See below.
Default value: /etc/mail/spamassassin/FuzzyOcr.safe.db
Option: 'focr_db_max_days'
Its value holds the filename to use for storing hash database; See below.
Default value: 35
Option: 'focr_keep_bad_images'
If this is set to 1, then this plugin will not remove the temporary image
directory created where the images are stored and processed if it
determines that the image was corrupt, or an error occurred with any
of the auxiliary programs that process the images. Usefull while
debugging.
Default value: 0
Changed:
Option: 'focr_logfile'
Defaults to 'stderr' so that logging goes there
Option: 'focr_enable_image_hashing' if set to 2:
Use MLDBM to store Hash info in true DB file for faster access.
Stores hashes of images that exceed set thresholds in file
specified by option focr_db_hash
Stores hashes of 'clean' images (without matching words)
specified by option focr_db_safe to also cache good images.
Keeps statistics of Hash-Hits and displays #times matched in log.
Saves name of attachment and content/type as reference
Automatically imports known-hashes from focr_digest_db into focr_db_hash
Automatically expire 'old' records if not matched in more than
the number of days specified in option 'focr_db_max_days'
Instead of having a 'global' timeout, the 'focr_timeout' is used per
external program used, this will ensure that there are no timeouts
recorded because of complex scansets, or because of temporary spikes
in load. Also, it now displays the name and return code information
for the binary that timedout, making it easier to debug problems.
Fixed:
A bug where option focr_counts_required was not recognized;
Logging to file when option 'focr_logfile' set now works;
Individual word scores are now applied correctly
Storing only images with matched words to hash database (Thanks to Robert LeBlanc)
Explicitly use Mail::SpamAssassin::Timeout (Thanks Eric Yiu)
Ignores empty lines in wordlists (global and local)
Ignores comments starting with (#) to EOL
Require:
Plugin officially requires SA 3.1.1 or higher
Added:
Support for BMP/TIFF Images
Changed:
Major internal restructuring
Use SpamAssassin Logging Facility instead of own logfile
Fixed:
A bug related to database hashing
Updated: Sep 12, 2006