[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: elmo-msgdb-get-message-id-from-buffer's performance issue
I like the idea of the default being strictness using the lexical
analyzer, but letting the user override that in favor of the
semi-strict regex to get better performance on tasks like re-syncing
large folders.
I think that using unfolding is unnecessary when using the lexical
analyzer because std11-parse-msg-ids-string seems to skip over spaces
and newlines properly.
I noticed that your example E does returna a result surrounded by "<"
and ">" but A-D apparently do.
Thanks for the help.
-Don
At Tue, 24 Jul 2012 09:27:40 -0500,
Kazuhiro Ito wrote:
>
> I tried testing elmo-msgdb-get-message-id-from-buffer's performance by
> re-syncing large localdir folder (which contains about 31000 messages)
> with threading.
>
> I redefined elmo-msgdb-get-message-id-from-buffer as below and
> measured time to re-sync. Please note measurement is very rough and
> function is not byte-compiled.
>
> Environment: ThinkPad X201s (Core i7-640LM, 2.13GHz), Windows7 (x64)
> Emacs 24.1.50 (locally built)
>
>
> A. as loose as possible
> (defun elmo-msgdb-get-message-id-from-buffer ()
> (let ((msgid (elmo-field-body "message-id")))
> (if msgid
> (if (string-match "<.+>" msgid)
> (match-string 0 msgid)
> (concat "<" msgid ">")) ; Invaild message-id.
> ;; no message-id, so put dummy msgid.
> (concat "<"
> (if (elmo-unfold-field-body "date")
> (timezone-make-date-sortable (elmo-unfold-field-body "date"))
> (md5 (string-as-unibyte (buffer-string))))
> (nth 1 (eword-extract-address-components
> (or (elmo-field-body "from") "nobody"))) ">"))))
>
> B. A without assuming narrowed
> (defun elmo-msgdb-get-message-id-from-buffer ()
> (let ((msgid (std11-field-body "message-id")))
> (if msgid
> (if (string-match "<.+>" msgid)
> (match-string 0 msgid)
> (concat "<" msgid ">")) ; Invaild message-id.
> ...
>
> C. A with more strict regexp
> (defun elmo-msgdb-get-message-id-from-buffer ()
> (let ((msgid (elmo-field-body "message-id")))
> (if msgid
> (if (string-match "\\`[ \n\t]*\\(<.+>\\)[ \n\t]*\\'" msgid)
> (match-string 1 msgid)
> (concat "<" msgid ">")) ; Invaild message-id.
> ...
>
> D. C with elmo-unfold-field-body
> (defun elmo-msgdb-get-message-id-from-buffer ()
> (let ((msgid (elmo-unfold-field-body "message-id")))
> (if msgid
> (if (string-match "\\`[ \t]*\\(<.+>\\)[ \t]*\\'" msgid)
> (match-string 1 msgid)
> (concat "<" msgid ">")) ; Invaild message-id.
> ...
>
> E. Using lexical analyzer
> (defun elmo-msgdb-get-message-id-from-buffer ()
> (let ((msgid (elmo-unfold-field-body "message-id")))
> (if msgid
> (or (let* ((tokens (std11-parse-msg-ids-string msgid))
> (id (assq 'msg-id tokens)))
> (setq id
> (unless (assq 'msg-id (delq id tokens))
> (std11-addr-to-string (cdr id))))
> ;; Return nil when result is "".
> (when (> (length id) 0) id))
> (concat "<" msgid ">")) ; Invaild message-id.
> ...
>
> F. combination of E and C
> (defun elmo-msgdb-get-message-id-from-buffer ()
> (let ((msgid (elmo-unfold-field-body "message-id")))
> (if msgid
> (or (let* ((tokens (std11-parse-msg-ids-string msgid))
> (id (assq 'msg-id tokens)))
> (setq id
> (unless (assq 'msg-id (delq id tokens))
> (std11-addr-to-string (cdr id))))
> ;; Return nil when result is "".
> (when (> (length id) 0) id))
> (if (string-match "\\`[ \n\t]*\\(<.+>\\)[ \n\t]*\\'" msgid)
> (match-string 1 msgid)
> (concat "<" msgid ">"))) ; Invaild message-id.
> ...
>
>
> Result:
> A 190sec (as loose as possible)
> B 189sec (A without assuming narrowed)
> C 188sec (A with more strict regexp)
> D 189sec (C with elmo-unfold-field-body)
> E 256sec (Using lexical analyzer)
> F 254sec (combination of E and C)
>
> I think differences of A, B, C and D are within error limit. At least
> in my environment (result may change in old Emacsen or systems),
>
> 1. It would be better making elmo-msgdb-get-message-id-from-buffer not
> to assume buffer is narrowed to header for robustness and
> maintainability.
>
> 2. In elmo-msgdb-get-message-id-from-buffer, kind of extracting header
> function and matching regexp little affect the performance.
>
> 3. Using lexical analyzer affects the performance. If we introduce
> lexical analyzer to extract Message-ID, I want a customizable option
> to disable it.
>
> BTW, in my localdir folders, I found only one Message-ID: header with
> comment. But that message was spam.
>
> --
> Kazuhiro Ito
>
>
Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer