[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
elmo-msgdb-get-message-id-from-buffer's performance issue
I tried testing elmo-msgdb-get-message-id-from-buffer's performance by
re-syncing large localdir folder (which contains about 31000 messages)
with threading.
I redefined elmo-msgdb-get-message-id-from-buffer as below and
measured time to re-sync. Please note measurement is very rough and
function is not byte-compiled.
Environment: ThinkPad X201s (Core i7-640LM, 2.13GHz), Windows7 (x64)
Emacs 24.1.50 (locally built)
A. as loose as possible
(defun elmo-msgdb-get-message-id-from-buffer ()
(let ((msgid (elmo-field-body "message-id")))
(if msgid
(if (string-match "<.+>" msgid)
(match-string 0 msgid)
(concat "<" msgid ">")) ; Invaild message-id.
;; no message-id, so put dummy msgid.
(concat "<"
(if (elmo-unfold-field-body "date")
(timezone-make-date-sortable (elmo-unfold-field-body "date"))
(md5 (string-as-unibyte (buffer-string))))
(nth 1 (eword-extract-address-components
(or (elmo-field-body "from") "nobody"))) ">"))))
B. A without assuming narrowed
(defun elmo-msgdb-get-message-id-from-buffer ()
(let ((msgid (std11-field-body "message-id")))
(if msgid
(if (string-match "<.+>" msgid)
(match-string 0 msgid)
(concat "<" msgid ">")) ; Invaild message-id.
...
C. A with more strict regexp
(defun elmo-msgdb-get-message-id-from-buffer ()
(let ((msgid (elmo-field-body "message-id")))
(if msgid
(if (string-match "\\`[ \n\t]*\\(<.+>\\)[ \n\t]*\\'" msgid)
(match-string 1 msgid)
(concat "<" msgid ">")) ; Invaild message-id.
...
D. C with elmo-unfold-field-body
(defun elmo-msgdb-get-message-id-from-buffer ()
(let ((msgid (elmo-unfold-field-body "message-id")))
(if msgid
(if (string-match "\\`[ \t]*\\(<.+>\\)[ \t]*\\'" msgid)
(match-string 1 msgid)
(concat "<" msgid ">")) ; Invaild message-id.
...
E. Using lexical analyzer
(defun elmo-msgdb-get-message-id-from-buffer ()
(let ((msgid (elmo-unfold-field-body "message-id")))
(if msgid
(or (let* ((tokens (std11-parse-msg-ids-string msgid))
(id (assq 'msg-id tokens)))
(setq id
(unless (assq 'msg-id (delq id tokens))
(std11-addr-to-string (cdr id))))
;; Return nil when result is "".
(when (> (length id) 0) id))
(concat "<" msgid ">")) ; Invaild message-id.
...
F. combination of E and C
(defun elmo-msgdb-get-message-id-from-buffer ()
(let ((msgid (elmo-unfold-field-body "message-id")))
(if msgid
(or (let* ((tokens (std11-parse-msg-ids-string msgid))
(id (assq 'msg-id tokens)))
(setq id
(unless (assq 'msg-id (delq id tokens))
(std11-addr-to-string (cdr id))))
;; Return nil when result is "".
(when (> (length id) 0) id))
(if (string-match "\\`[ \n\t]*\\(<.+>\\)[ \n\t]*\\'" msgid)
(match-string 1 msgid)
(concat "<" msgid ">"))) ; Invaild message-id.
...
Result:
A 190sec (as loose as possible)
B 189sec (A without assuming narrowed)
C 188sec (A with more strict regexp)
D 189sec (C with elmo-unfold-field-body)
E 256sec (Using lexical analyzer)
F 254sec (combination of E and C)
I think differences of A, B, C and D are within error limit. At least
in my environment (result may change in old Emacsen or systems),
1. It would be better making elmo-msgdb-get-message-id-from-buffer not
to assume buffer is narrowed to header for robustness and
maintainability.
2. In elmo-msgdb-get-message-id-from-buffer, kind of extracting header
function and matching regexp little affect the performance.
3. Using lexical analyzer affects the performance. If we introduce
lexical analyzer to extract Message-ID, I want a customizable option
to disable it.
BTW, in my localdir folders, I found only one Message-ID: header with
comment. But that message was spam.
--
Kazuhiro Ito