Mercurial > hgrepos > hgweb.cgi > imapext
comparison docs/mixfmt.txt @ 0:ada5e610ab86
imap-2007e
author | yuuji@gentei.org |
---|---|
date | Mon, 14 Sep 2009 15:17:45 +0900 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:ada5e610ab86 |
---|---|
1 /* ======================================================================== | |
2 * Copyright 1988-2006 University of Washington | |
3 * | |
4 * Licensed under the Apache License, Version 2.0 (the "License"); | |
5 * you may not use this file except in compliance with the License. | |
6 * You may obtain a copy of the License at | |
7 * | |
8 * http://www.apache.org/licenses/LICENSE-2.0 | |
9 * | |
10 * | |
11 * ======================================================================== | |
12 */ | |
13 | |
14 Last update: 18 December 2006 | |
15 | |
16 INTRODUCTION | |
17 | |
18 This file is the descendant of a design document used to specify the | |
19 mix format. An attempt is being made to keep this document more or | |
20 less current with the way the mix format actually works. | |
21 | |
22 | |
23 1. Mix mailbox naming | |
24 | |
25 Mailbox names correspond to directory names; thus mix format mailboxes | |
26 are "dual-use" (lack both \NoInferiors and \NoSelect). This will | |
27 satisfy some long-standing requests. | |
28 | |
29 | |
30 2. Mailbox files | |
31 | |
32 A mix format mailbox is a directory with regular files with filenames | |
33 of: | |
34 .mixmeta mailbox metadata file | |
35 .mixindex message index file (message static data) | |
36 .mixstatus message status file (message dynamic data) | |
37 .mix######## (where ######### is a <hex8>) secondary message | |
38 data files. | |
39 .mix primary message data file (used in experimental | |
40 versions, supported for compatibility only) | |
41 | |
42 2.1 Metadata, index, and status files | |
43 | |
44 The mailbox metadata, index, and status files contain a sequence of | |
45 CRLF-terminated lines. These files have an update sequence, which is | |
46 a strictly-ascending sequence value. Any time the file is changed, | |
47 the update sequence is increased; this allows easy detection of | |
48 whether the file has been changed by another process. For now, this | |
49 update sequence is a modseq (see below). | |
50 | |
51 2.1.1 Metadata file | |
52 | |
53 The mailbox metadata file is called ".mixmeta". It contains a series | |
54 of CRLF-terminated lines. The first character of the line is a key that | |
55 identifies the payload of the line, and the remainder of the line is the | |
56 payload. | |
57 Key Payload | |
58 --- ------- | |
59 S <hex8> ;; update sequence | |
60 V <hex8> ;; UIDVALIDITY | |
61 L <hex8> ;; UIDLAST | |
62 N <hex8> ;; current new message file | |
63 K [atom 0*(SP atom)] ;; keyword list | |
64 | |
65 All other keys are reserved for future assignment and must be ignored | |
66 (and may be discarded) by software which does not recognize them. The | |
67 mailbox metadata file is rewritten as part of new mail delivery (so | |
68 APPENDUID/COPYUID can work) and when new keywords are added. | |
69 | |
70 2.1.2 Message static index file | |
71 | |
72 The mailbox message static index file is called ".mixindex". It contains | |
73 a series of CRLF-terminated lines. The first character of the line is a | |
74 key that identifies the payload of the line, and the remainder of the line | |
75 is the payload. | |
76 Key Payload | |
77 --- ------- | |
78 S <hex8> ;; update sequence | |
79 : <uid>:<date>:<size>:<file>:<pos>:<isiz>:<hsiz> | |
80 ;; per-message record | |
81 | |
82 The per-message records contain the following data: | |
83 <uid> = <hex8> ;; message UID | |
84 <date> = <yyyymmddhhmmss+zzzz> ;; internal date | |
85 <size> = <hex8> ;; rfc822.size | |
86 <file> = <hex8> ;; message data file (0 = .mix file) | |
87 <pos> = <hex8> ;; message position in file | |
88 <isiz> = <hex8> ;; message internal data size | |
89 <hsiz> = <hex8> ;; header size (offset to body) | |
90 | |
91 All other keys, and subsequent fields in per-message records, are | |
92 reserved for future assignment and must be ignored (and may be | |
93 discarded) by software which does not recognize them. The mailbox | |
94 metadata file is appended by new mail delivery and rewritten by | |
95 expunge "burping", and otherwise is not altered. | |
96 | |
97 2.1.3 Message dynamic status file | |
98 | |
99 The mailbox message dynamic status file is called ".mixstatus". It contains | |
100 a series of CRLF-terminated lines. The first character of the line is a | |
101 key that identifies the payload of the line, and the remainder of the line | |
102 is the payload. | |
103 Key Payload | |
104 --- ------- | |
105 S <hex8> ;; update sequence | |
106 : <uid>:<uf>:<sf>:<mod>: ;; per-message record | |
107 | |
108 The per-message records contain the following data: | |
109 <uid> = <hex8> ;; message UID | |
110 <keys> = <hex8> ;; keyword flags | |
111 <flag> = <hex4> ;; system flags | |
112 <mod> = <hex8> ;; date/time last modified (modseq) | |
113 | |
114 All other keys, and subsequent fields in per-message records, are | |
115 reserved for future assignment and must be ignored (and may be | |
116 discarded) by software which does not recognize them. The mailbox | |
117 dynamic idex file is rewritten by flag changes (or any future change | |
118 that alters dynamic data) and is re-read when a session sees that the | |
119 mtime has changed (atime and ctime are not used). | |
120 | |
121 The modseq is an unsigned 32-bit date/time, along with a guarantee | |
122 that this value can not go backwards. It currently corresponds to the | |
123 time from time(); however, since it is unsigned, it won't run out until | |
124 the year 2106. In the future, this may be used as a basic for implementing | |
125 the IMAP CONDSTORE extension. | |
126 | |
127 2.2 Message data files | |
128 | |
129 A mix message file is a regular file with filename starting with | |
130 ".mix" followed by a <hex8> suffix which indicates the file number. It | |
131 contains a series of CRLF-terminated lines. By special dispensation, the | |
132 filename ".mix" is used for file number 0, which was used in experimental | |
133 versions of mix as a "primary" file (this concept no longer exists). | |
134 | |
135 A file number is set to the current modseq when it is created. If a copy | |
136 or append causes the file to exceed the compiled-in file size limit, a new | |
137 file is started and the metadata is updated accordingly. | |
138 | |
139 Preceeding each message is per-message record with the following format: | |
140 Key Payload | |
141 --- ------- | |
142 ;; per-message record | |
143 : :<code>:<uid>:<date>:<size>: | |
144 | |
145 The per-message records contain the following data: | |
146 <code> = "msg" ;; fixed code | |
147 <uid> = <hex8> ;; message UID | |
148 <date> = <yyyymmddhhmmss+zzzz> ;; internal date | |
149 <size> = <hex8> ;; rfc822.size | |
150 The message data begins on the next line | |
151 | |
152 Subsequent fields are reserved for future assignment and must be ignored. | |
153 | |
154 | |
155 3. New mail delivery | |
156 | |
157 To deliver a new message, it is necessary to share lock the destination | |
158 metadata file, then get an exclusive lock on the destination index and | |
159 status files. Once this is done, the new message data is appended to the | |
160 new message file. The metadata (UIDLAST value), index, and status | |
161 files are all updated to add the new message. | |
162 | |
163 Then all the destination mailbox files are closed. | |
164 | |
165 | |
166 4. Mailbox pinging | |
167 | |
168 The index and status files are share locked. Initially, sequences are | |
169 remembered as zero, so at open time they are always "altered". | |
170 | |
171 The sequence from the index file is checked; if it is altered the index | |
172 file is read and processed as follows: | |
173 . If expunge is permitted, then any messages that are not in the index | |
174 are reported as having been expunged via mm_expunged(). | |
175 . new messages are announced via mm_exists()/mm_recent(). | |
176 | |
177 Next, the sequence from the status file is checked. If it is altered, | |
178 the status file is read and the status updated for any message which is | |
179 new or has an altered modseq in the status file. Altered modseq messages | |
180 are announced via mm_flags(). | |
181 | |
182 Then the index and status files are closed. | |
183 | |
184 | |
185 4. Flag alteration | |
186 | |
187 The status file is exclusive locked. | |
188 | |
189 The sequence from the status file is checked. If it is altered, the | |
190 status file is read and the status updated for any message which is | |
191 new or has an altered modseq in the status file. Altered modseq | |
192 messages are announced via mm_flags(). | |
193 | |
194 The alterations are then applied for all requested messages, updating | |
195 the modseq for each requestedmessage which changes flags as a result | |
196 of the alteration (alterations which do not result in a change do not | |
197 alter the modseq). Then the status file is rewritten with a new | |
198 sequence, but only if flags of at least one message was changed. | |
199 | |
200 Then the status file is closed. | |
201 | |
202 | |
203 5. Checkpoint and expunge | |
204 | |
205 Checkpoint is identical to expunge, however it skips the step of expunging | |
206 deleted messages. | |
207 | |
208 The index and status files are locked exclusive. If expunging, all | |
209 deleted messages are expunged from the index and announced via | |
210 mm_expunged(). The message data is notremoved at this time. | |
211 | |
212 If a checkpoint was requested, or if any messages were expunged, or if | |
213 it remembered that a "burp" was needed, then: | |
214 . the metadata file is locked exclusive. If this fails, remember that | |
215 a burp is needed. Otherwise perform a burp: | |
216 . calculate the file byte ranges occupied by expunged messages | |
217 . for each file needing "burping", open and slide down subsequent file | |
218 data on top of the expunged messages | |
219 . update the index and status files | |
220 | |
221 Then the index and status files are closed. | |
222 | |
223 5.1 More details on expunging and "burping" | |
224 | |
225 Shared expunge presents a problem due to the requirements of the IMAP | |
226 protocol. You can't "burp" away a message until you are certain that | |
227 no sharers have a pointer to any longer. Consequently, for the nonce | |
228 "burping" out expunged data be defered to an exclusive expunge as in | |
229 mbx format. | |
230 | |
231 If shared burping is ever implemented, then care will be needed not to | |
232 burp data that a session still relies upon. It's easy enough to burp | |
233 the index files; just create new index files, deleting the old, and | |
234 require that you look for a new one appearing at mailbox ping time | |
235 (when it's safe). The data files are a problem, since we | |
236 intentionally don't want to keep them open and do want to avoid quota | |
237 problems by overwriting in place. Also, when you burp you have to | |
238 change the pointers in the index file. | |
239 | |
240 Bottom line: shared burping is too hairy right now, so the first | |
241 version will do exclusive-only burping and not worry about it. If | |
242 shared burping is really needed, then that routine will need to be | |
243 rewritten. | |
244 | |
245 Shared burping has been a problem for every other IMAP server. Most | |
246 get it wrong, and cause terrible confusion to clients (including | |
247 client crashes). | |
248 | |
249 | |
250 6. Message data file file roll out strategy | |
251 | |
252 The current new message file is finalized, and a new one started, when | |
253 an append or copy is done that would cause the file to grow to larger | |
254 than a preconfigured size (MIXDATAROLL). A multi-message copy or | |
255 append is written into its entirety to a single new message file. In | |
256 the case of multi-copy, the new message file is switched when the sum | |
257 of the sizes of all messages to be copied would cause the current new | |
258 message file to exceed MIXDATAROLL. In the case of multi-append, only | |
259 the first message is considered; this is due to technical limitations. | |
260 | |
261 7. Error detection | |
262 | |
263 Mix detects bad data in the metadata, index, and status files; and | |
264 declares the stream dead. It does not unilaterally reassign | |
265 UIDVALIDITY the way that the flat file formats do. | |
266 | |
267 When mix reads a header from the message file, it also reads the | |
268 per-message record and verifies that there is a per-message record there. | |
269 This is a simple test for message file corruption. It doesn't declare | |
270 the stream dead; it simply issues an error message and returns a | |
271 zero-length string for the message header. This makes it possible for | |
272 the user to fix the mailbox simply by deleting and expunging any messages | |
273 that are in this state. | |
274 | |
275 | |
276 8. Reconstruct tool | |
277 | |
278 [None of this is implemented yet.] | |
279 | |
280 The layout of these files is designed to make the reconstruct tool be | |
281 as simple as possible. Much of the need for the reconstruct tool is | |
282 eliminated since the mix format has a much more limited scope of | |
283 writing than the flat file formats; thus there is "less collateral | |
284 damage." | |
285 | |
286 If the metadata file is lost or corrupted, then all keywords are lost; | |
287 if the mailbox has any keywords used in the .mixstatus file, it'll be | |
288 necessary to create some placeholder names. Otherwise, a new | |
289 UIDVALIDITY can be assigned, and a good UIDLAST value calculated by | |
290 the reconstruct tool. Since this file is very small, it's not likely | |
291 to be damaged. | |
292 | |
293 If the index file is lost or corrupted, it is possible to reconstruct | |
294 it with no loss by reading all the data files. However, this could | |
295 cause expunged but not yet burped messages to reappear. | |
296 | |
297 If the status file is lost or corrupted, then flags are lost and | |
298 will revert to a default state of no flags set. Just deleting the | |
299 corrupted file is good enough. | |
300 | |
301 The reconstruct tool can use the per-message record in the message | |
302 file to locate messages if the recorded sizes and/or messages are | |
303 corrupt. If that happens, it will need to rebuild the index file | |
304 (with associated changes to the metadata file to change the | |
305 UIDVALIDITY). That should probably be a manual operation and not be | |
306 part of the default operation or auto-reconstruct. | |
307 | |
308 | |
309 9. Locking strategy | |
310 | |
311 The mix format does not use the traditional c-client /tmp file locking. | |
312 | |
313 The metadata file is open and locked whenever the mailbox is open. | |
314 Normally this is a shared lock, but it will be upgraded to exclusive | |
315 if the mailbox is expunged. As a guard (since there is no true | |
316 lock-upgrade/downgrade on UNIX), the index exclusive lock must be | |
317 acquired first before upgrading to exclusive. | |
318 | |
319 The index file is shared locked when reading the index, and exclusive | |
320 locked (and read) when appending new messages to the index or when | |
321 expunging (note that expunging also requires an exclusive lock on | |
322 metadata). Normally, the index file is not open or locked. | |
323 | |
324 The status file is shared locked when reading status, and exclusive | |
325 locked (and read) when updating status. Normally, the status file is | |
326 not open or locked. | |
327 | |
328 It isn't necessary to lock any of the data files as long as we only | |
329 have exclusive burping. | |
330 | |
331 | |
332 10. Memory usage | |
333 | |
334 The mix format returns a file stringstruct, which is the modern | |
335 c-client behavior. This prevents imapd from growing to enormous sizes | |
336 due to a godzillagram (how it affects other programs depends upon what | |
337 they do with the returned stringstruct). | |
338 | |
339 | |
340 11. Future extensions | |
341 | |
342 Cached ENVELOPE, BODYSTRUCTURE. Cyrus does, and this will eliminate | |
343 most of the reason to access the data files. Possibly cached overviews, | |
344 ala NNTP, instead? | |
345 | |
346 | |
347 Support for ANNOTATION. | |
348 | |
349 | |
350 12. RENAME issues | |
351 | |
352 Mix currently makes no attempt to address the IMAP RENAME problem. | |
353 This occurs when a mailbox is deleted, and another mailbox is renamed | |
354 with that name in place, no attempt is made to reassign UIDVALIDITY | |
355 for this mailbox and all the inferior mailboxes. This potentially can | |
356 cause problems for a disconnected-use client that has cached status | |
357 for the old mailbox which had that name. | |
358 | |
359 The RENAME problem is a well known flaw in the IMAP protocol. Few | |
360 servers correctly handle it (among other things, not only do all the | |
361 UIDVALIDITY values have to be changed but this has to be done | |
362 atomically!). It was a mistake to add RENAME into IMAP, but it's much | |
363 too late to remove it now. |