comparison docs/mixfmt.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:ada5e610ab86
1 /* ========================================================================
2 * Copyright 1988-2006 University of Washington
3 *
4 * Licensed under the Apache License, Version 2.0 (the "License");
5 * you may not use this file except in compliance with the License.
6 * You may obtain a copy of the License at
7 *
8 * http://www.apache.org/licenses/LICENSE-2.0
9 *
10 *
11 * ========================================================================
12 */
13
14 Last update: 18 December 2006
15
16 INTRODUCTION
17
18 This file is the descendant of a design document used to specify the
19 mix format. An attempt is being made to keep this document more or
20 less current with the way the mix format actually works.
21
22
23 1. Mix mailbox naming
24
25 Mailbox names correspond to directory names; thus mix format mailboxes
26 are "dual-use" (lack both \NoInferiors and \NoSelect). This will
27 satisfy some long-standing requests.
28
29
30 2. Mailbox files
31
32 A mix format mailbox is a directory with regular files with filenames
33 of:
34 .mixmeta mailbox metadata file
35 .mixindex message index file (message static data)
36 .mixstatus message status file (message dynamic data)
37 .mix######## (where ######### is a <hex8>) secondary message
38 data files.
39 .mix primary message data file (used in experimental
40 versions, supported for compatibility only)
41
42 2.1 Metadata, index, and status files
43
44 The mailbox metadata, index, and status files contain a sequence of
45 CRLF-terminated lines. These files have an update sequence, which is
46 a strictly-ascending sequence value. Any time the file is changed,
47 the update sequence is increased; this allows easy detection of
48 whether the file has been changed by another process. For now, this
49 update sequence is a modseq (see below).
50
51 2.1.1 Metadata file
52
53 The mailbox metadata file is called ".mixmeta". It contains a series
54 of CRLF-terminated lines. The first character of the line is a key that
55 identifies the payload of the line, and the remainder of the line is the
56 payload.
57 Key Payload
58 --- -------
59 S <hex8> ;; update sequence
60 V <hex8> ;; UIDVALIDITY
61 L <hex8> ;; UIDLAST
62 N <hex8> ;; current new message file
63 K [atom 0*(SP atom)] ;; keyword list
64
65 All other keys are reserved for future assignment and must be ignored
66 (and may be discarded) by software which does not recognize them. The
67 mailbox metadata file is rewritten as part of new mail delivery (so
68 APPENDUID/COPYUID can work) and when new keywords are added.
69
70 2.1.2 Message static index file
71
72 The mailbox message static index file is called ".mixindex". It contains
73 a series of CRLF-terminated lines. The first character of the line is a
74 key that identifies the payload of the line, and the remainder of the line
75 is the payload.
76 Key Payload
77 --- -------
78 S <hex8> ;; update sequence
79 : <uid>:<date>:<size>:<file>:<pos>:<isiz>:<hsiz>
80 ;; per-message record
81
82 The per-message records contain the following data:
83 <uid> = <hex8> ;; message UID
84 <date> = <yyyymmddhhmmss+zzzz> ;; internal date
85 <size> = <hex8> ;; rfc822.size
86 <file> = <hex8> ;; message data file (0 = .mix file)
87 <pos> = <hex8> ;; message position in file
88 <isiz> = <hex8> ;; message internal data size
89 <hsiz> = <hex8> ;; header size (offset to body)
90
91 All other keys, and subsequent fields in per-message records, are
92 reserved for future assignment and must be ignored (and may be
93 discarded) by software which does not recognize them. The mailbox
94 metadata file is appended by new mail delivery and rewritten by
95 expunge "burping", and otherwise is not altered.
96
97 2.1.3 Message dynamic status file
98
99 The mailbox message dynamic status file is called ".mixstatus". It contains
100 a series of CRLF-terminated lines. The first character of the line is a
101 key that identifies the payload of the line, and the remainder of the line
102 is the payload.
103 Key Payload
104 --- -------
105 S <hex8> ;; update sequence
106 : <uid>:<uf>:<sf>:<mod>: ;; per-message record
107
108 The per-message records contain the following data:
109 <uid> = <hex8> ;; message UID
110 <keys> = <hex8> ;; keyword flags
111 <flag> = <hex4> ;; system flags
112 <mod> = <hex8> ;; date/time last modified (modseq)
113
114 All other keys, and subsequent fields in per-message records, are
115 reserved for future assignment and must be ignored (and may be
116 discarded) by software which does not recognize them. The mailbox
117 dynamic idex file is rewritten by flag changes (or any future change
118 that alters dynamic data) and is re-read when a session sees that the
119 mtime has changed (atime and ctime are not used).
120
121 The modseq is an unsigned 32-bit date/time, along with a guarantee
122 that this value can not go backwards. It currently corresponds to the
123 time from time(); however, since it is unsigned, it won't run out until
124 the year 2106. In the future, this may be used as a basic for implementing
125 the IMAP CONDSTORE extension.
126
127 2.2 Message data files
128
129 A mix message file is a regular file with filename starting with
130 ".mix" followed by a <hex8> suffix which indicates the file number. It
131 contains a series of CRLF-terminated lines. By special dispensation, the
132 filename ".mix" is used for file number 0, which was used in experimental
133 versions of mix as a "primary" file (this concept no longer exists).
134
135 A file number is set to the current modseq when it is created. If a copy
136 or append causes the file to exceed the compiled-in file size limit, a new
137 file is started and the metadata is updated accordingly.
138
139 Preceeding each message is per-message record with the following format:
140 Key Payload
141 --- -------
142 ;; per-message record
143 : :<code>:<uid>:<date>:<size>:
144
145 The per-message records contain the following data:
146 <code> = "msg" ;; fixed code
147 <uid> = <hex8> ;; message UID
148 <date> = <yyyymmddhhmmss+zzzz> ;; internal date
149 <size> = <hex8> ;; rfc822.size
150 The message data begins on the next line
151
152 Subsequent fields are reserved for future assignment and must be ignored.
153
154
155 3. New mail delivery
156
157 To deliver a new message, it is necessary to share lock the destination
158 metadata file, then get an exclusive lock on the destination index and
159 status files. Once this is done, the new message data is appended to the
160 new message file. The metadata (UIDLAST value), index, and status
161 files are all updated to add the new message.
162
163 Then all the destination mailbox files are closed.
164
165
166 4. Mailbox pinging
167
168 The index and status files are share locked. Initially, sequences are
169 remembered as zero, so at open time they are always "altered".
170
171 The sequence from the index file is checked; if it is altered the index
172 file is read and processed as follows:
173 . If expunge is permitted, then any messages that are not in the index
174 are reported as having been expunged via mm_expunged().
175 . new messages are announced via mm_exists()/mm_recent().
176
177 Next, the sequence from the status file is checked. If it is altered,
178 the status file is read and the status updated for any message which is
179 new or has an altered modseq in the status file. Altered modseq messages
180 are announced via mm_flags().
181
182 Then the index and status files are closed.
183
184
185 4. Flag alteration
186
187 The status file is exclusive locked.
188
189 The sequence from the status file is checked. If it is altered, the
190 status file is read and the status updated for any message which is
191 new or has an altered modseq in the status file. Altered modseq
192 messages are announced via mm_flags().
193
194 The alterations are then applied for all requested messages, updating
195 the modseq for each requestedmessage which changes flags as a result
196 of the alteration (alterations which do not result in a change do not
197 alter the modseq). Then the status file is rewritten with a new
198 sequence, but only if flags of at least one message was changed.
199
200 Then the status file is closed.
201
202
203 5. Checkpoint and expunge
204
205 Checkpoint is identical to expunge, however it skips the step of expunging
206 deleted messages.
207
208 The index and status files are locked exclusive. If expunging, all
209 deleted messages are expunged from the index and announced via
210 mm_expunged(). The message data is notremoved at this time.
211
212 If a checkpoint was requested, or if any messages were expunged, or if
213 it remembered that a "burp" was needed, then:
214 . the metadata file is locked exclusive. If this fails, remember that
215 a burp is needed. Otherwise perform a burp:
216 . calculate the file byte ranges occupied by expunged messages
217 . for each file needing "burping", open and slide down subsequent file
218 data on top of the expunged messages
219 . update the index and status files
220
221 Then the index and status files are closed.
222
223 5.1 More details on expunging and "burping"
224
225 Shared expunge presents a problem due to the requirements of the IMAP
226 protocol. You can't "burp" away a message until you are certain that
227 no sharers have a pointer to any longer. Consequently, for the nonce
228 "burping" out expunged data be defered to an exclusive expunge as in
229 mbx format.
230
231 If shared burping is ever implemented, then care will be needed not to
232 burp data that a session still relies upon. It's easy enough to burp
233 the index files; just create new index files, deleting the old, and
234 require that you look for a new one appearing at mailbox ping time
235 (when it's safe). The data files are a problem, since we
236 intentionally don't want to keep them open and do want to avoid quota
237 problems by overwriting in place. Also, when you burp you have to
238 change the pointers in the index file.
239
240 Bottom line: shared burping is too hairy right now, so the first
241 version will do exclusive-only burping and not worry about it. If
242 shared burping is really needed, then that routine will need to be
243 rewritten.
244
245 Shared burping has been a problem for every other IMAP server. Most
246 get it wrong, and cause terrible confusion to clients (including
247 client crashes).
248
249
250 6. Message data file file roll out strategy
251
252 The current new message file is finalized, and a new one started, when
253 an append or copy is done that would cause the file to grow to larger
254 than a preconfigured size (MIXDATAROLL). A multi-message copy or
255 append is written into its entirety to a single new message file. In
256 the case of multi-copy, the new message file is switched when the sum
257 of the sizes of all messages to be copied would cause the current new
258 message file to exceed MIXDATAROLL. In the case of multi-append, only
259 the first message is considered; this is due to technical limitations.
260
261 7. Error detection
262
263 Mix detects bad data in the metadata, index, and status files; and
264 declares the stream dead. It does not unilaterally reassign
265 UIDVALIDITY the way that the flat file formats do.
266
267 When mix reads a header from the message file, it also reads the
268 per-message record and verifies that there is a per-message record there.
269 This is a simple test for message file corruption. It doesn't declare
270 the stream dead; it simply issues an error message and returns a
271 zero-length string for the message header. This makes it possible for
272 the user to fix the mailbox simply by deleting and expunging any messages
273 that are in this state.
274
275
276 8. Reconstruct tool
277
278 [None of this is implemented yet.]
279
280 The layout of these files is designed to make the reconstruct tool be
281 as simple as possible. Much of the need for the reconstruct tool is
282 eliminated since the mix format has a much more limited scope of
283 writing than the flat file formats; thus there is "less collateral
284 damage."
285
286 If the metadata file is lost or corrupted, then all keywords are lost;
287 if the mailbox has any keywords used in the .mixstatus file, it'll be
288 necessary to create some placeholder names. Otherwise, a new
289 UIDVALIDITY can be assigned, and a good UIDLAST value calculated by
290 the reconstruct tool. Since this file is very small, it's not likely
291 to be damaged.
292
293 If the index file is lost or corrupted, it is possible to reconstruct
294 it with no loss by reading all the data files. However, this could
295 cause expunged but not yet burped messages to reappear.
296
297 If the status file is lost or corrupted, then flags are lost and
298 will revert to a default state of no flags set. Just deleting the
299 corrupted file is good enough.
300
301 The reconstruct tool can use the per-message record in the message
302 file to locate messages if the recorded sizes and/or messages are
303 corrupt. If that happens, it will need to rebuild the index file
304 (with associated changes to the metadata file to change the
305 UIDVALIDITY). That should probably be a manual operation and not be
306 part of the default operation or auto-reconstruct.
307
308
309 9. Locking strategy
310
311 The mix format does not use the traditional c-client /tmp file locking.
312
313 The metadata file is open and locked whenever the mailbox is open.
314 Normally this is a shared lock, but it will be upgraded to exclusive
315 if the mailbox is expunged. As a guard (since there is no true
316 lock-upgrade/downgrade on UNIX), the index exclusive lock must be
317 acquired first before upgrading to exclusive.
318
319 The index file is shared locked when reading the index, and exclusive
320 locked (and read) when appending new messages to the index or when
321 expunging (note that expunging also requires an exclusive lock on
322 metadata). Normally, the index file is not open or locked.
323
324 The status file is shared locked when reading status, and exclusive
325 locked (and read) when updating status. Normally, the status file is
326 not open or locked.
327
328 It isn't necessary to lock any of the data files as long as we only
329 have exclusive burping.
330
331
332 10. Memory usage
333
334 The mix format returns a file stringstruct, which is the modern
335 c-client behavior. This prevents imapd from growing to enormous sizes
336 due to a godzillagram (how it affects other programs depends upon what
337 they do with the returned stringstruct).
338
339
340 11. Future extensions
341
342 Cached ENVELOPE, BODYSTRUCTURE. Cyrus does, and this will eliminate
343 most of the reason to access the data files. Possibly cached overviews,
344 ala NNTP, instead?
345
346
347 Support for ANNOTATION.
348
349
350 12. RENAME issues
351
352 Mix currently makes no attempt to address the IMAP RENAME problem.
353 This occurs when a mailbox is deleted, and another mailbox is renamed
354 with that name in place, no attempt is made to reassign UIDVALIDITY
355 for this mailbox and all the inferior mailboxes. This potentially can
356 cause problems for a disconnected-use client that has cached status
357 for the old mailbox which had that name.
358
359 The RENAME problem is a well known flaw in the IMAP protocol. Few
360 servers correctly handle it (among other things, not only do all the
361 UIDVALIDITY values have to be changed but this has to be done
362 atomically!). It was a mistake to add RENAME into IMAP, but it's much
363 too late to remove it now.

yatex.org