Mail Re-Parsing In PostOffice Service
Mail Re-Parsing in CommCore Post Office Service
Last updated mapping table: December 12, 2016
This article describes the new functionality in Channels 9.4.0.6 (CU6) whereby Channels CommCore Post Office Service is designed to resolve potential email-parsing failures resulting from incompatible charset encoding.
This new feature will reparse an email, if needed, to a mapped charset to enable parsing the email and enable further email processing. The new capability includes database changes as well as a scheduled task for periodically checking for parsing failures, re-parsing the failed emails and inserting them into the Channels database for further processing.
Upon upgrading to a release with this feature, any previously failed to parse emails will re-parse with the default mapping (UTF-8).
As a .NET based application, the Post Office is compatible with charset/encoding supported by the .NET framework. Therefore, emails with a charset/encoding not supported by .NET will not parsed by the Post Office unless updated to a supported charset/encoding. The.NET framework does have equivalents for charsets/encodings used by most systems and therefore email can be successfully re-parsed using the .NET equivalent of the original charset/encoding.
This functionality takes action in the following situations:
- When a mail is initially downloaded from the mail server and parsed.
- When a mail is already downloaded and is in the “failed mail” list. (by the scheduled task)
Mail Re-Parsing on Initial Download
If parsing fails when the mail is downloaded from the server and parsed for the first time, the Post Office will to re-parse the mail using the mapped charset/encoding.
- If there is an equivalent charset/encoding defined in the mapping list and if it is valid, the mail is successfully re-parsed and added to the system.
- If there is no mapping or if the defined mapping is invalid, the mail is not re-parsed and is added to the MailsFailedInParse table.
The mails added to the system are treated as normal incoming mails:
- Workflows are executed on them,
- Properties (if used) are extracted,
- Greetings are added, if configured.
- Auto acknowledgements are not sent for such mails.
Re-Parsing Mails from Failed Mails (MailsFailedInparse table)
The Post Office executes a scheduled task to re-parse mails that failed to parse on initial download. The re-parsing occurs once a day, by default at midnight UTC. Channels system administrators with database access can update the time for this task to run on a daily basis for mails in the "MailsFailedinparse” table with invalid charset mappings. In addition, Channels system administrators with database access can update or add new mappings definitions with valid mappings.
It is recommended to monitor this table (MailsFailedInparse table) for new entries so emails are not left unprocessed.
When this scheduled task runs at the scheduled time:
- It executes a database query to determine which if any emails need to be re-parsed.
- It selects all mails with a corresponding charset mapping and parses them using the updated mapped charset/encoding. Successfully parsed mails are removed from the “failed mail” list (MailsFailedInparse) and added to the database table MailMessage and other related tables.
The mail re-parser task does not process mails in the following cases.
- If they do not have a mapping of the original charset to the .NET equivalent.
- If they failed to parse with a mapped encoding in the earlier task execution cycle (for the current lifetime of the PO service process).
If the scheduled task is executed after a service restart it again selects all the mails from the "MailsFailedinparse" table which have a mapped encoding and tries to re-parse them.
Note: If the re-parser encounters an invalid charset/encoding it does not process any mail matching that charset/encoding.
Database Changes
The following changes have been made to the Channels database to support the re-parser and the related scheduled task.
ServiceEx
The ServiceEx table stores configurable settings applicable to different services. A new row is added to this database table for the re-parser scheduled task as described below.
Column name |
Type |
Description |
Name |
nvarchar, 255 (not Null) |
A new value MailReparseInterval is added to the ‘Name’ column. The ServiceEX::ValueEx column holds the time at which the scheduled task is executed every day. It should be in the following format: "HH:MM:SS". 24 hour clock. UTC. |
An example:
CharsetMapping
A new table “charsetMapping” is added to the Channels database to store charset mappings and support the re-parsing functionality as described below.
Field name |
Datatype |
Description |
ID |
int |
Identity column, auto incremented. Primary Key. |
Charset |
Nvarchar(32) |
Charset not recognized by PO/.NET. |
Alias |
Nvarchar(32) |
NET equivalent of the Charset. |
CompanyID |
int |
Company id the mapping belongs to. |
Constraints
- Only one charset can be defined for a company;
- However the same alias can be mapped to multiple charsets.
- The Alias should be valid and belong to the set that the .NET framework currently supports. See Code Page Identifiers.
An Example:
MailsFailedInParse
The following new fields/columns are added to this table.
- OriginalCharset
- FailureType
The following field/column is changed for datatype
- RawContent
Field name |
Datatype |
Description |
OriginalCharset |
Nvarchar(32) |
It holds the original charset of the failed mail. The administrator should provide a .NET equivalent of this column entry into the CharsetMapping table for a successful re-parsing of the mail. |
FailureType |
tinyint |
It holds the type of mail parsing failure. Currently there are three possible values an entry can take. · None = -1 · Unknown = 0 · Charset = 1 · Mail Content = 2 · Internal Parser Error(MailBee) = 3
|
RawContent |
VARBINARY(max) |
The data type of the column RawContent is changed from NVARCHAR to VARBINARY. The mail content is stored as raw bytes as received from the mail server. This will help re-parse the mail using the mapped charset/encoding. |
Events
Field name |
Description |
MailEvents |
For every successful re-parsed mail an entry of EVType 77 is added. |
MailProcessingEvents |
For every successful mail re-parse an entry of EvType 10 is added. For every unsuccessful mail re-parse due to an invalid mapped encoding an entry of EvType 11 is added.
|
EvType: 10. |
MessageId: Message id of the failed mail. |
Mailbox: MailsFailedInParse::MailboxID. |
|
EvParam3: Original charset. |
|
EvParam4: Mapped charset (CharsetMapping::Alias). |
|
EvType: 11 |
MailID: MailsFailedInParse::FailedMsgId |
Mailbox: MailsFailedInParse::MailboxID. |
|
EvParam3: Original charset. |
|
EvParam4: Mapped charset (CharsetMapping::Alias). |
Functional Procedure
On upgrading to Channels 9.4.0.6, the new functionality works as follows:
- Mails in the “MailsFailedInParse” table prior to the upgrade are parsed with UTF-8 encoding irrespective of their original charset.
- The “failuretype” for existing failed mails is set to “Charset:1”.
- If the failure is for non-charset reasons, the “FailureType” is updated appropriately in the next scheduled mail re-parse interval.
- The “CharsetMapping” table has an entry of “Charset” set to '(null)', “Alias” set to “UTF-8”. This entry is used to re-parse existing failed mails.
- The “MailsFailedInParse::OriginalCharset” for existing entries is set to ‘(null)’, so that the mail maps to the ‘(null)’ entry in the CharsetMapping table and is picked up by the re-parse scheduled task and parsed using the ‘UTF-8’ encoding.
Predefined Charset Mappings
The charset mappings supported out-of-the-box by Channels in 9.4 CU8 is as described in the following table. If your version does not have these, it is recommended to consider adding them.
Charset/Encoding |
Mapping (.NET Equivalent) |
(null) |
UTF-8 |
Cp1252 |
windows-1252 |
utf8 |
utf-8 |
cp1252 |
windows-1252 |
utf-8; charset="utf-8" |
utf-8 |
utf-8 |
utf-8 |
utf8 |
utf-8 |
cp932 |
shift_jis |
cp936 |
gb2312 |
iso-8851-1 |
iso-8859-1 |
iso-8859-1 |
iso-8859-1 |
iso-8859-10 |
iso-8859-4 |
iso-8859-10 |
iso-8859-4 |
windows-31j |
shift_jis |
us-ascii |
us-ascii |
CP-850 |
ibm850 |
UTF_8 |
utf-8 |
iso_6937-2-add |
x-cp20269 |
ansi_x3.110-1983 |
iso-8859-1 |
iso-8859-1\r\n |
iso-8859-1 |
ISO8859-15 |
iso-8859-1 |
utf-8utf-8 |
utf-8 |
utf-8http-equivContent-Type |
utf-8 |
viscii |
utf-8 |
CP-850 |
ibm850 |
charset="utf-8" |
utf-8 |
WINDOWS-1252 |
windows-1252 |
charset=iso-8859-1 |
iso-8859-1 |
3Dutf-8 |
utf-8 |
cp1250 |
windows-1250 |
_iso-2022-jp$ESC |
iso-2022-jp |
gb2132 |
gb2312 |
ISO8859_1 |
iso-8859-1 |
ISO-8859-1 MIME-Version: 1.0 |
iso-8859-1 |
us-ascii, gb2312 |
us-ascii |
X-UNKNOWN |
utf-8 |
=utf-8 |
utf-8 |
csISO4UnitedKingdom |
utf-8 |
134 |
gb2312 |
136 |
big5 |
en-utf-8 |
utf-8 |
WIN1252 |
windows-1252 |
charset=\ |
utf-8 |
iso-2022-jp-2 |
iso-2022-jp |
Since the charset-mapping system is built with extensibility in mind a new charset-mapping can be added whenever required.