FuzzyOcr 2.3d

This is a new distribution based on a plugin developed by Christian Holler decoder_at_own-hero_dot_net

Changes

version 2.3d

    Require:
        Plugin officially requires SA 3.1.4 or higher
        New Perl Modules
            DB_File
            Storable
            MLDBM
        Previous
            String::Approx

    Removed:
        Option: 'focr_pre314'
            Not used as it now requires SA 3.1.4

    Added:
        Option: 'focr_path_bin'
            Its value is treated as path for searching of @bin_utils, potentially
                requiring less configuration options;
            Directories in the path that don't exists, are skipped;
            Default value: /usr/local/netpbm/bin:/usr/local/bin:/usr/bin

        Option: 'focr_db_hash'
            Its value holds the filename to use for storing hash database; See below.
            Default value: /etc/mail/spamassassin/FuzzyOcr.db

        Option: 'focr_db_safe'
            Its value holds the filename to use for storing hash database; See below.
            Default value: /etc/mail/spamassassin/FuzzyOcr.safe.db

        Option: 'focr_db_max_days'
            Its value holds the filename to use for storing hash database; See below.
            Default value: 35

        Option: 'focr_keep_bad_images'
            If this is set to 1, then this plugin will not remove the temporary image
                directory created where the images are stored and processed if it 
                determines that the image was corrupt, or an error occurred with any
                of the auxiliary programs that process the images. Usefull while
                debugging.
            Default value: 0
            

    Changed:
        Option: 'focr_logfile'
            Defaults to 'stderr' so that logging goes there
        Option: 'focr_enable_image_hashing' if set to 2:
            Use MLDBM to store Hash info in true DB file for faster access.
            Stores hashes of images that exceed set thresholds in file
                specified by option focr_db_hash
            Stores hashes of 'clean' images (without matching words)
                specified by option focr_db_safe to also cache good images.
            Keeps statistics of Hash-Hits and displays #times matched in log.
            Saves name of attachment and content/type as reference
            Automatically imports known-hashes from focr_digest_db into focr_db_hash
            Automatically expire 'old' records if not matched in more than
                the number of days specified in option 'focr_db_max_days'
        Instead of having a 'global' timeout, the 'focr_timeout' is used per
            external program used, this will ensure that there are no timeouts
            recorded because of complex scansets, or because of temporary spikes
            in load. Also, it now displays the name and return code information
            for the binary that timedout, making it easier to debug problems.

    Fixed:
        A bug where option focr_counts_required was not recognized;
        Logging to file when option 'focr_logfile' set now works;
        Individual word scores are now applied correctly
        Storing only images with matched words to hash database (Thanks to Robert LeBlanc)
        Explicitly use Mail::SpamAssassin::Timeout (Thanks Eric Yiu)
        Ignores empty lines in wordlists (global and local)
        Ignores comments starting with (#) to EOL

version 2.3c

    Require:
        Plugin officially requires SA 3.1.1 or higher
    
    Added:
        Support for BMP/TIFF Images

    Changed:
        Major internal restructuring
        Use SpamAssassin Logging Facility instead of own logfile

    Fixed:
        A bug related to database hashing
Home Top 2.3c 2.3d

Updated: Sep 11, 2006