[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

elmo-msgdb-get-message-id-from-buffer's performance issue

To: wl-en@ml.gentei.org
Subject: elmo-msgdb-get-message-id-from-buffer's performance issue
From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
Date: Tue, 24 Jul 2012 23:27:40 +0900
List-help: <mailto:wl-en-ctl@ml.gentei.org?body=help>
List-id: wl-en.ml.gentei.org
List-owner: <mailto:wl-en-admin@ml.gentei.org>
List-post: <mailto:wl-en@ml.gentei.org>
List-software: fml [fml 4.0 STABLE (20040215/4.0.4_BETA)]
List-unsubscribe: <mailto:wl-en-ctl@ml.gentei.org?body=unsubscribe>
Reply-to: wl-en@ml.gentei.org
User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (Gojō) APEL/10.8 EasyPG/1.0.0 Emacs/24.1.50 (i386-mingw-nt6.1.7601) MULE/6.0 (HANACHIRUSATO)

I tried testing elmo-msgdb-get-message-id-from-buffer's performance by
re-syncing large localdir folder (which contains about 31000 messages)
with threading.

I redefined elmo-msgdb-get-message-id-from-buffer as below and
measured time to re-sync.  Please note measurement is very rough and
function is not byte-compiled.

Environment: ThinkPad X201s (Core i7-640LM, 2.13GHz), Windows7 (x64)
Emacs 24.1.50 (locally built)


A. as loose as possible
(defun elmo-msgdb-get-message-id-from-buffer ()
  (let ((msgid (elmo-field-body "message-id")))
    (if msgid
	(if (string-match "<.+>" msgid)
	    (match-string 0 msgid)
	  (concat "<" msgid ">"))	; Invaild message-id.
      ;; no message-id, so put dummy msgid.
      (concat "<"
	      (if (elmo-unfold-field-body "date")
		  (timezone-make-date-sortable (elmo-unfold-field-body "date"))
		(md5 (string-as-unibyte (buffer-string))))
	      (nth 1 (eword-extract-address-components
		      (or (elmo-field-body "from") "nobody"))) ">"))))

B. A without assuming narrowed
(defun elmo-msgdb-get-message-id-from-buffer ()
  (let ((msgid (std11-field-body "message-id")))
    (if msgid
	(if (string-match "<.+>" msgid)
	    (match-string 0 msgid)
	  (concat "<" msgid ">"))	; Invaild message-id.
...

C. A with more strict regexp
(defun elmo-msgdb-get-message-id-from-buffer ()
  (let ((msgid (elmo-field-body "message-id")))
    (if msgid
	(if (string-match "\\`[ \n\t]*\\(<.+>\\)[ \n\t]*\\'" msgid)
	    (match-string 1 msgid)
	  (concat "<" msgid ">"))	; Invaild message-id.
...

D. C with elmo-unfold-field-body
(defun elmo-msgdb-get-message-id-from-buffer ()
  (let ((msgid (elmo-unfold-field-body "message-id")))
    (if msgid
	(if (string-match "\\`[ \t]*\\(<.+>\\)[ \t]*\\'" msgid)
	    (match-string 1 msgid)
	  (concat "<" msgid ">"))	; Invaild message-id.
...

E. Using lexical analyzer
(defun elmo-msgdb-get-message-id-from-buffer ()
  (let ((msgid (elmo-unfold-field-body "message-id")))
    (if msgid
	(or (let* ((tokens (std11-parse-msg-ids-string msgid))
		   (id (assq 'msg-id tokens)))
	      (setq id
		    (unless (assq 'msg-id (delq id tokens))
		      (std11-addr-to-string (cdr id))))
	      ;; Return nil when result is "".
	      (when (> (length id) 0) id))
	    (concat "<" msgid ">"))	; Invaild message-id.
...

F. combination of E and C
(defun elmo-msgdb-get-message-id-from-buffer ()
  (let ((msgid (elmo-unfold-field-body "message-id")))
    (if msgid
	(or (let* ((tokens (std11-parse-msg-ids-string msgid))
		   (id (assq 'msg-id tokens)))
	      (setq id
		    (unless (assq 'msg-id (delq id tokens))
		      (std11-addr-to-string (cdr id))))
	      ;; Return nil when result is "".
	      (when (> (length id) 0) id))
	    (if (string-match "\\`[ \n\t]*\\(<.+>\\)[ \n\t]*\\'" msgid)
		(match-string 1 msgid)
	      (concat "<" msgid ">")))	; Invaild message-id.
...


Result:
A 190sec (as loose as possible)
B 189sec (A without assuming narrowed)
C 188sec (A with more strict regexp)
D 189sec (C with elmo-unfold-field-body)
E 256sec (Using lexical analyzer)
F 254sec (combination of E and C)

I think differences of A, B, C and D are within error limit.  At least
in my environment (result may change in old Emacsen or systems),

1. It would be better making elmo-msgdb-get-message-id-from-buffer not
to assume buffer is narrowed to header for robustness and
maintainability.

2. In elmo-msgdb-get-message-id-from-buffer, kind of extracting header
function and matching regexp little affect the performance.

3. Using lexical analyzer affects the performance.  If we introduce
lexical analyzer to extract Message-ID, I want a customizable option
to disable it.  

BTW, in my localdir folders, I found only one Message-ID: header with
comment.  But that message was spam.

-- 
Kazuhiro Ito

Follow-Ups:
- Re: elmo-msgdb-get-message-id-from-buffer's performance issue
  - From: "Bashford, Donald" <Don.Bashford@stjude.org>

Prev by Date: Re: Duplicate In-Reply-To entries in reply buffer
Next by Date: Re: Duplicate In-Reply-To entries in reply buffer
Previous by thread: [Question] starttls on wanderlust doesn't work on Emacs24, Windows Vista.
Next by thread: Re: elmo-msgdb-get-message-id-from-buffer's performance issue
Index(es):
- Date
- Thread