[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Duplicate In-Reply-To entries in reply buffer



I'm okay with whatever the developers decide on the strict vs
non-strict parsing of message-id header contents. But I have one point
on the strict vs non-strict question itself, and three more on what
strict standards compliance might involve.

1) Aren't message-id header created by mail transfer agents rather
than mail user agents? Doesn't this make crazy headers less likely?

2) The "strict" regex we've been discussing shouldn't have that dollar
sign near the end. It conflicts with the \' in the tests I've done.

3) Looking again at RFC 5322, I'm dismayed to see that comments (set
off by parentheses) seem to be allowed in the message-id header field
both before and after the actual message ID. Therefore the following
are perfectly legal:

Message-ID: (a comment with a <) <87d33ob7bk.wl%dmaus@ictsoc.de> (a comment with a >)

Message-ID: (a comment with a <)
 <87d33ob7bk.wl%dmaus@ictsoc.de> (a
 comment with a >)

Both of the regex's we've been discussing fail on one or both of
these.

4) the std11 module of flim seems to provide some machinery for
handling this. For example:

(let ((s "(a comment with a <)
 <87d33ob7bk.wl%dmaus@ictsoc.de> (a
 comment with a >)"))
  (concat "<"
          (mapconcat 'cdr
                     (cdr (assq 'msg-id (std11-parse-msg-id-string s))) "")
          ">"))

Returns "<87d33ob7bk.wl%dmaus@ictsoc.de>". I haven't thoroughly
analyzed and tested this solution; for one thing, it should probably
check whether the parse function returned mulitiple msg-id's or just
one.  It's obviously more costly what we've been discussing.

-Don


At Sun, 22 Jul 2012 02:09:35 -0500,
David Maus wrote:
>
> [1  <text/plain; UTF-8 (quoted-printable)>]
> At Sat, 21 Jul 2012 17:15:19 +0900,
> Kazuhiro Ito wrote:
> >
> > > > Sorry, regexp should be exactly "\\`[ \t\n]*\\(<[^>]+>\\)[ \t\n]*$\\'".
> > > > I don't know how strict that function should be, I feel that function
> > > > could be loose as below;
> > > >
> > > >  (defun elmo-msgdb-get-message-id-from-buffer ()
> > > >   (let ((msgid (elmo-field-body "message-id")))
> > > >     (if msgid
> > > >       (if (string-match "<.+>" msgid)
> > > >           (match-string 0 msgid)
> > >
> > > I agree, but maybe that simpler regex could be "<[^>\n]+>" to guard
> > > against things like "<foo> <bar>" by only taking the first one.
> >
> > I think that extracting "<foo>" from "<foo> <bar>" wouldn't be better.
> > Beacuse, there would be a problem when regexp extracts existing
> > Message-ID from invalid header.  If "<foo>" is existing Message-ID,
> > wl-summary-jump-to-msg-by-message-id may pick the wrong message.
>
> Good point.
>
> >
> > I think "\\`[ \t\n]*\\(<.+>\\)[ \t\n]*$\\'" is the most safe to avoid
> > the above problem.  But I doubt whether the above problem actually
> > occurs, because I couldn't imagine recent programs create such insane
> > Message-ID: header.  So, I feel the most loose "<.+>" whould be
> > adequate.
>
> I would take the opposite perspective: We should expect the weirdest
> and craziest mail headers of past, current, and future mail
> clients. There is not reason for me to take standard compliance for
> granted or to trust mail client providers that they will do the right
> thing™ and/or don't abuse mail headers for power struggles or
> whatever. Be as strict as possible and only relax strictness if there
> is an important reason to do so.
>
> Best,
>   -- David
> --
> OpenPGP... 0x99ADB83B5A4478E6
> Jabber.... dmjena@jabber.org
> Email..... dmaus@ictsoc.de
> [2 OpenPGP Digital Signature <application/pgp-signature (7bit)>]
>

Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer