Multibyte Mail for WordPress

Written in the mid-afternoon in English • Tags: ,

The Multibyte Mail plugin replaces the wp_mail() function with one that tries to encode the usual email message headers that might contain 8-bit data. It should work well with the core WordPress code as well as any plugins, unless the plugin is sending out some unusual headers.

If you receive bounces for comments on articles with 8-bit (or multibyte) characters in their title (or in the name of the comment author), this plugin should prove helpful.

One of the users of WordPress in Finnish reported such problems. Once I received a copy of the error message, it was clear where the problem is: WordPress is stuffing 8-bit data into the message headers directly.

I thought it would be a simple thing to fix with a plugin. Just replace the wp_mail() function with one that calls mb_send_mail(). From the documentation of the latter, it sounds like you just need to encode the To: header, if necessary, before calling the function. Well, not quite so…

The mb_send_mail() function only encodes the message body and none of the headers, as far as I can tell from experiments. It also depends on the mb_language() setting, which only “knows” about English and Japanese. Mapping WordPress settings to this would be problematic. There is a UTF-8-safe setting (“uni”), but this results in base64-encoded messages. At least mutt(1) has problems there: it seems to forget about the character set in the message by the time it has decoded the message body, and displays 8-bit characters using question marks regardless. Not a good result by any means.

My next attempt was to use mb_encode_mimeheader() on the headers. Well, you can’t send all headers at once to this function — it will stuff everything into the first header in the string. I wrote code to split the headers at newlines (taking care of continuation lines). This got me to the next problem: mb_encode_mimeheader() will encode just about anything outside A-Z. Not a good thing for the From: header, for example, not to mention Content-Type.

So this is how I arrived at the code that is now in the plugin, looking more complex than necessary. After splitting the headers, I only encode From: and Reply-To:. More accurately, I only encode the full name (i.e. “comment” text) in these, in the format typically used by WordPress. Let’s use an example:

From: John Doe <john@doe.main>

The encoded text would be the “John Doe” part, nothing else.

The proper fix is to go through the WordPress code and add calls to mb_encode_mimeheader() where email headers are constructed. Then one can start educating plugin authors to do the same…