[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: elmo-msgdb-get-message-id-from-buffer's performance issue



I like the idea of the default being strictness using the lexical
analyzer, but letting the user override that in favor of the
semi-strict regex to get better performance on tasks like re-syncing
large folders.

I think that using unfolding is unnecessary when using the lexical
analyzer because std11-parse-msg-ids-string seems to skip over spaces
and newlines properly.

I noticed that your example E does returna a result surrounded by "<"
and ">" but A-D apparently do.

Thanks for the help.

-Don

At Tue, 24 Jul 2012 09:27:40 -0500,
Kazuhiro Ito wrote:
>
> I tried testing elmo-msgdb-get-message-id-from-buffer's performance by
> re-syncing large localdir folder (which contains about 31000 messages)
> with threading.
>
> I redefined elmo-msgdb-get-message-id-from-buffer as below and
> measured time to re-sync.  Please note measurement is very rough and
> function is not byte-compiled.
>
> Environment: ThinkPad X201s (Core i7-640LM, 2.13GHz), Windows7 (x64)
> Emacs 24.1.50 (locally built)
>
>
> A. as loose as possible
> (defun elmo-msgdb-get-message-id-from-buffer ()
>   (let ((msgid (elmo-field-body "message-id")))
>     (if msgid
>       (if (string-match "<.+>" msgid)
>           (match-string 0 msgid)
>         (concat "<" msgid ">"))       ; Invaild message-id.
>       ;; no message-id, so put dummy msgid.
>       (concat "<"
>             (if (elmo-unfold-field-body "date")
>                 (timezone-make-date-sortable (elmo-unfold-field-body "date"))
>               (md5 (string-as-unibyte (buffer-string))))
>             (nth 1 (eword-extract-address-components
>                     (or (elmo-field-body "from") "nobody"))) ">"))))
>
> B. A without assuming narrowed
> (defun elmo-msgdb-get-message-id-from-buffer ()
>   (let ((msgid (std11-field-body "message-id")))
>     (if msgid
>       (if (string-match "<.+>" msgid)
>           (match-string 0 msgid)
>         (concat "<" msgid ">"))       ; Invaild message-id.
> ...
>
> C. A with more strict regexp
> (defun elmo-msgdb-get-message-id-from-buffer ()
>   (let ((msgid (elmo-field-body "message-id")))
>     (if msgid
>       (if (string-match "\\`[ \n\t]*\\(<.+>\\)[ \n\t]*\\'" msgid)
>           (match-string 1 msgid)
>         (concat "<" msgid ">"))       ; Invaild message-id.
> ...
>
> D. C with elmo-unfold-field-body
> (defun elmo-msgdb-get-message-id-from-buffer ()
>   (let ((msgid (elmo-unfold-field-body "message-id")))
>     (if msgid
>       (if (string-match "\\`[ \t]*\\(<.+>\\)[ \t]*\\'" msgid)
>           (match-string 1 msgid)
>         (concat "<" msgid ">"))       ; Invaild message-id.
> ...
>
> E. Using lexical analyzer
> (defun elmo-msgdb-get-message-id-from-buffer ()
>   (let ((msgid (elmo-unfold-field-body "message-id")))
>     (if msgid
>       (or (let* ((tokens (std11-parse-msg-ids-string msgid))
>                  (id (assq 'msg-id tokens)))
>             (setq id
>                   (unless (assq 'msg-id (delq id tokens))
>                     (std11-addr-to-string (cdr id))))
>             ;; Return nil when result is "".
>             (when (> (length id) 0) id))
>           (concat "<" msgid ">"))     ; Invaild message-id.
> ...
>
> F. combination of E and C
> (defun elmo-msgdb-get-message-id-from-buffer ()
>   (let ((msgid (elmo-unfold-field-body "message-id")))
>     (if msgid
>       (or (let* ((tokens (std11-parse-msg-ids-string msgid))
>                  (id (assq 'msg-id tokens)))
>             (setq id
>                   (unless (assq 'msg-id (delq id tokens))
>                     (std11-addr-to-string (cdr id))))
>             ;; Return nil when result is "".
>             (when (> (length id) 0) id))
>           (if (string-match "\\`[ \n\t]*\\(<.+>\\)[ \n\t]*\\'" msgid)
>               (match-string 1 msgid)
>             (concat "<" msgid ">")))  ; Invaild message-id.
> ...
>
>
> Result:
> A 190sec (as loose as possible)
> B 189sec (A without assuming narrowed)
> C 188sec (A with more strict regexp)
> D 189sec (C with elmo-unfold-field-body)
> E 256sec (Using lexical analyzer)
> F 254sec (combination of E and C)
>
> I think differences of A, B, C and D are within error limit.  At least
> in my environment (result may change in old Emacsen or systems),
>
> 1. It would be better making elmo-msgdb-get-message-id-from-buffer not
> to assume buffer is narrowed to header for robustness and
> maintainability.
>
> 2. In elmo-msgdb-get-message-id-from-buffer, kind of extracting header
> function and matching regexp little affect the performance.
>
> 3. Using lexical analyzer affects the performance.  If we introduce
> lexical analyzer to extract Message-ID, I want a customizable option
> to disable it.
>
> BTW, in my localdir folders, I found only one Message-ID: header with
> comment.  But that message was spam.
>
> --
> Kazuhiro Ito
>
>

Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer