[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problems finding and storing Message-ID



I'm seeing failures of wl-summary-jump-to-msg-by-message-id when I try
to search for messages with long message IDs such as:
<D456FE7EBDBD6047A4373F930BED54D44E68617C37@SJMEMXMBS10.stjude.sjcrh.local>
which is 75 characters long. These long IDs are generated by our
institutions Outlook/Exchange setup, so I'm guessing we're not the
only place that has such monsters.

When I try to use wl-summary-jump-to-msg-by-message-id and give it the
above ID while in an MH-mailbox that I use to store saved messages, it
fails to find message, even though it is there. (I took this ID from a
message in that folder.) But on sum other long IDs, as on most short
IDs, it works. Comparing the Message-IDs in messages with those stored
in ~/.elmo/localdir/sashank/msgid, I see that the long message IDs are
stored with a newline and a space at the start of the string, like:
"\n <D456FE7EBDBD6047A4373F930BED54D44E68617C37@SJMEMXMBS10.stjude.sjcrh.local>"
In some of the messages stored in the MH folder, the message ID field
has a "\n " just ahead of the message ID, and in some it does not, so
what's in the msgid database might or might not be the same as what
comes after "Message ID: " in the email header. For short message ID's
everything is okay, as far as I can tell.

According to RFC 5322 section 3.6.4, together with the fact that only
one ID can be in a message-id field, together with the rules about
folding white space in section 3.2.2, the syntax for a message-id is,

  message-id = "Message-ID:" [CFWS] "<" id-left" "@" id-right ">" CRLF

where [CFWS] indicates a place where comments or folding white space (FWS) may be
inserted. FWS is defined as a CRLF. The definition of FWS is

  FWS = ([*WSP CRLF] 1*WSP)

But for the present purposes the FWS instance is essentially " \n ",
where "\n" is (sort of) the equivalent of CRLF. As for the meaning of
FWS in a header field, RFC 5322 says:

   Wherever folding appears in a message (that is, a header field body
   containing a CRLF followed by any WSP), unfolding (removal of the
   CRLF) is performed before any further semantic analysis is
   performed on that header field according to this specification.
   That is to say, any CRLF that appears in FWS is semantically
   "invisible".

I conclude that functions handling message IDs, and storing them in
data structures or in files like the msgid files, should squeeze out
the folded space (i.e. "unfold") the header contents. That way, the
msgid files would have
"<D456FE7EBDBD6047A4373F930BED54D44E68617C37@SJMEMXMBS10.stjude.sjcrh.local>"
For the above message.

I guess this all relates to the question of email line-length limits.
The RFC says that lines produced by mail-producing software "SHOULD"
contain lines no longer than 78 characters but "MUST" contain lines no
longer than 998. I think this imples that recieving applications
"MUST" be able to process 998-character lines bu "SHOULD" be able to
handle longer ones (my reading of section 2.1.1).

Apparently, many producing applications use that CRLF WS sequence
between "Message-ID:" and the actual ID to meet the 78-character limit
when faced with long IDs. But that does not mean Wanderlust has the
right to ASSUME they do this. Anyway, Wanderlust should store these
IDs unfolded so as to have a folding-insensitive method of matching.

-Don

Email Disclaimer:  www.stjude.org/emaildisclaimer