Mail Re-Parsing In PostOffice Service

Mail Re-Parsing in CommCore Post Office Service

Last updated mapping table:  December 12, 2016

This article describes the new functionality in Channels 9.4.0.6 (CU6) whereby Channels CommCore Post Office Service is designed to resolve potential email-parsing failures resulting from incompatible charset encoding.


This new feature will reparse an email, if needed, to a mapped charset to enable parsing the email and enable further email processing.  The new capability includes database changes as well as a scheduled task for periodically checking for parsing failures, re-parsing the failed emails and inserting them into the Channels database for further processing.

Upon upgrading to a release with this feature,  any previously failed to parse emails will re-parse with the default mapping (UTF-8).

As a .NET based application,  the Post Office is compatible with charset/encoding supported by the .NET framework. Therefore, emails with a charset/encoding not supported by .NET will not parsed by the Post Office unless updated to a supported charset/encoding. The.NET framework does have equivalents for charsets/encodings used by most systems and therefore email can be successfully re-parsed using the .NET equivalent of the original charset/encoding.  

This functionality takes action in the following situations:

  1. When a mail is initially downloaded from the mail server and parsed.
  2. When a mail is already downloaded and is in the “failed mail” list. (by the scheduled task)

 

Mail Re-Parsing on Initial Download

If parsing fails when the mail is downloaded from the server and parsed for the first time, the Post Office will to re-parse the mail using the mapped charset/encoding.

  • If there is an equivalent charset/encoding defined in the mapping list and if it is valid, the mail is successfully re-parsed and added to the system.
  • If there is no mapping or if the defined mapping is invalid, the mail is not re-parsed and is added to the MailsFailedInParse table.

The mails added to the system are treated as normal incoming mails:

  • Workflows are executed on them,
  • Properties (if used) are extracted,
  • Greetings are added, if configured.
  • Auto acknowledgements are not sent for such mails.

 

Re-Parsing Mails from Failed Mails (MailsFailedInparse table)

The Post Office executes a scheduled task to re-parse mails that failed to parse on initial download. The re-parsing occurs once a day, by default at midnight UTC.  Channels system administrators with database access can update the time for this task to run on a daily basis for mails in the "MailsFailedinparse” table with invalid charset mappings.  In addition, Channels system administrators with database access can update or add new mappings definitions with valid mappings.

It is recommended to monitor this table (MailsFailedInparse table) for new entries so emails are not left unprocessed.  

When this scheduled task runs at the scheduled time:

  • It executes a database query to determine which if any emails need to be re-parsed.
  • It selects all mails with a corresponding charset mapping and parses them using the updated mapped charset/encoding. Successfully parsed mails are removed from the “failed mail” list (MailsFailedInparse) and added to the database table MailMessage and other related tables.

The mail re-parser task does not process mails in the following cases.

  • If they do not have a mapping of the original charset to the .NET equivalent.
  • If they failed to parse with a mapped encoding in the earlier task execution cycle (for the current lifetime of the PO service process).

If the scheduled task is executed after a service restart it again selects all the mails from the "MailsFailedinparse" table which have a mapped encoding and tries to re-parse them.  

Note: If the re-parser encounters an invalid charset/encoding it does not process any mail matching that charset/encoding.

 

Database Changes

The following changes have been made to the Channels database to support the re-parser and the related scheduled task.

ServiceEx

The ServiceEx table stores configurable settings applicable to different services. A new row is added to this database table for the re-parser scheduled task as described below.

Column name

Type

Description

Name

nvarchar, 255 (not Null)

A new value MailReparseInterval  is  added to the ‘Name’ column.

The ServiceEX::ValueEx column holds the time at which the scheduled task is executed every day.  It should be in the following format:

"HH:MM:SS".

24 hour clock.

UTC.

 

An example:  

 

 

CharsetMapping

A new table “charsetMapping” is added to the Channels database to store charset mappings and support the re-parsing functionality as described below.

Field name

Datatype

Description

ID

int

Identity column, auto incremented. Primary Key.

Charset

Nvarchar(32)

Charset not recognized by PO/.NET.

Alias

Nvarchar(32)

NET equivalent of the Charset.

CompanyID

int

Company id the mapping belongs to.

 Constraints

  • Only one charset can be defined for a company;
  • However the same alias can be mapped to multiple charsets.
  • The Alias should be valid and belong to the set that the .NET framework currently supports. See Code Page Identifiers.

 

An Example:

 

MailsFailedInParse

The following new fields/columns are added to this table.

  • OriginalCharset
  • FailureType

The following field/column is changed for datatype

  • RawContent

  

Field name

Datatype

Description

OriginalCharset

Nvarchar(32)

It holds the original charset of the failed mail. The administrator should provide a .NET equivalent of this column entry into the CharsetMapping table for a successful re-parsing of the mail.

FailureType

tinyint

It holds the type of mail parsing failure. Currently there are three possible values an entry can take.

·         None = -1

·         Unknown = 0

·         Charset = 1

·         Mail Content = 2

·         Internal Parser Error(MailBee) = 3

 

RawContent

VARBINARY(max)

The data type of the column RawContent is changed from NVARCHAR to VARBINARY.

The mail content is stored as raw bytes as received from the mail server. This will help  re-parse the mail using the mapped charset/encoding.

 

Events

Field name

Description

MailEvents

For every successful re-parsed mail an entry of EVType 77 is added.

MailProcessingEvents

For every successful mail re-parse an entry of EvType 10 is added.

For every unsuccessful mail re-parse due to an invalid mapped encoding an entry of EvType 11 is added.

EvType: 10.

MessageId: Message id of the failed mail.

Mailbox: MailsFailedInParse::MailboxID.

EvParam3: Original charset.

EvParam4: Mapped charset (CharsetMapping::Alias).

EvType: 11

MailID: MailsFailedInParse::FailedMsgId

Mailbox: MailsFailedInParse::MailboxID.

EvParam3: Original charset.

EvParam4: Mapped charset (CharsetMapping::Alias).

 Functional Procedure

On upgrading to Channels 9.4.0.6, the new functionality works as follows:

  1. Mails in the “MailsFailedInParse” table prior to the upgrade are parsed with UTF-8 encoding irrespective of their original charset.
  2. The “failuretype” for existing failed mails is set to “Charset:1”.
  3. If the failure is for non-charset reasons, the “FailureType” is updated appropriately in the next scheduled mail re-parse interval.
  4. The “CharsetMapping” table has an entry of “Charset” set to '(null)', “Alias” set to “UTF-8”. This entry is used to re-parse existing failed mails.
  5. The “MailsFailedInParse::OriginalCharset” for existing entries is set to ‘(null)’, so that the mail maps to the ‘(null)’ entry in the CharsetMapping table and is picked up by the re-parse scheduled task and parsed using the ‘UTF-8’ encoding.

 

Predefined Charset Mappings

The charset mappings supported out-of-the-box by Channels in 9.4 CU8 is as described in the following table. If your version does not have these, it is recommended to consider adding them.

 

Charset/Encoding

Mapping (.NET Equivalent)

(null)

UTF-8

Cp1252

windows-1252

utf8

utf-8

cp1252

windows-1252

utf-8; charset="utf-8"

utf-8

utf-8

utf-8

utf8

utf-8

cp932

shift_jis

cp936

gb2312

iso-8851-1

iso-8859-1

iso-8859-1

iso-8859-1

iso-8859-10

iso-8859-4

iso-8859-10

iso-8859-4

windows-31j

shift_jis

us-ascii

us-ascii

CP-850

ibm850

UTF_8

utf-8

iso_6937-2-add

x-cp20269

ansi_x3.110-1983

iso-8859-1

iso-8859-1\r\n

iso-8859-1

ISO8859-15

iso-8859-1

utf-8utf-8

utf-8

utf-8http-equivContent-Type

utf-8

viscii

utf-8

CP-850

ibm850

charset="utf-8"

utf-8

WINDOWS-1252

windows-1252

charset=iso-8859-1

iso-8859-1

3Dutf-8

utf-8

cp1250

windows-1250

_iso-2022-jp$ESC

iso-2022-jp

gb2132

gb2312

ISO8859_1

iso-8859-1

ISO-8859-1 MIME-Version: 1.0

iso-8859-1

us-ascii, gb2312

us-ascii

X-UNKNOWN

utf-8

=utf-8

utf-8

csISO4UnitedKingdom

utf-8

134

gb2312

136

big5

en-utf-8

utf-8

WIN1252

windows-1252

charset=\

utf-8

iso-2022-jp-2

iso-2022-jp

 

Since the charset-mapping system is built with extensibility in mind a new charset-mapping can be added whenever required.