ARCH


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173

% Architecture of distix, a ticketing system
% Lars Wirzenius

Introduction
============

Distix is a distributed ticketing system. It is currently a
work-in-progress. This document explains the general architecture of
distix, particularly the way it stores data.

The big picture
===============

The general approach is to store data as files in git. There's two
kinds of files: metadata about a ticket, and the discussion about each
ticket. Messages in the discussion are stores as e-mails in a Maildir.
Metadata is stored as a YAML file and is structured as key/value
pairs. There can be multiple values for one key, but each value must
be unique.

I chose git as the way to store data, since it is already good at
synchronising distributed data stores, and merging changes from
multiple sources. The distix data is carefully structured to make it
easy for git to merge changes: e-mails never change, and key/value
lists are easy to merge.

On discussions about tickets
============================

I believe, very firmly, that e-mail is a good way to discuss problems
such as those that are relevant to a ticketing systems. There are
other good ways, of course, but e-mail is the common base that
everyone has, and that works quite well. distix thus stores
discussions as e-mails, and it stores the actual e-mails rather than
trying to extract what I think right now is relevant from each e-mail.

Ideally, I want to have a mailing list for supporting users of some
piece of free software, and everyone e-mails the list when they need
help. Distix will be subscribed to the list, and nobody else needs to
know about it, unless they want to. Distix won't send any e-mails.
Everyone cc's the mailing list on all mails, and distix will ingest
them and automatically open new tickets for new discussion threads,
and assign each incoming e-mail to existing tickets if appropriate.

The support staff then uses distix to keep track of a large number of
concurrent discussions. They'll mark tickets as closed when the issue
seems to have been dealt with, or they mark them as describing actual
bugs that need fixing, or whatever is appropriate.


distix architecture
===================

distix stores data as files in git. Each git repository holds data
for one ticketing system instance only. There is some metadata about
the repository (ticketing system instance), and then a number of
tickets. Each ticket consists of ticket metadata and a Maildir with
all the discussion about the ticket.

Maildir is a good way to store mails. Each mail gets a unique id, and
this should make them easily mergeable with git.

The metadata is key/value pairs. The key will be a string, and the
value will also be a string. Keys may occur multiple times: in effect,
the value is then a non-empty list of unique strings. This
multiplicity is important so that users may, for example, do things
like having a key for which supported branches a bug is affected by:

    found-in-branch=supported/1.1
    found-in-branch=supported/1.2

The key/value pairs are stored in YAML files. JSON would also be a
nice choice, except it sucks for human editability, and doesn't,
especially, deal well, from a usability point of view, with string
values that contain many lines of text. While I don't expect,
long-term, for distix users to edit the metadata files directly, I
will have to, and so I care about the file format.

I don't want to invent a new format myself. That's silly, and I've
done it a few dozen times already, so I'm bored by it.

Thus a metadata file will look like this:

    key:
    - value
    - value 2
    key2:
    - value
    - value 3

I'll worry about namespacing keys later. I don't need to have the
perfect format from start, and I'm not yet sure namespacing will be
worth it. Right now, I care about having something simple that works.
I'll worry about perfect later.

The repository has a layout like this:

    repo.yaml               -- repository metadata
    tickets/
        TICKETID/
            ticket.yaml     -- ticket metadata
            Maildir/

I may later decide to have the TICKETID level be multiple levels,
based on the ticket id, to avoid having too many tickets in one
directory. But later, this is fine for now.

I store the ticket id in the ticket's metadata. This allows us to
rename directories and also simplifies the code that needs to know a
ticket.

If I decide to implement an "archive" feature, where tickets that are
moved after they are well and truly done, and nobody cares about them
anymore, and nobody calls, and nobody visits them, but they shouldn't
be deleted either, then I'll add a top level "archive" directory. But
we'll see, that's an idea for the future.

I may add more metadata files, too, such as one for stored searches,
or such, but, again, we'll see.

I'll let git worry about merging. This opens up the possibility of
merge conflicts. For now, I'll punt on that, and let the user deal
with the conflicts manually. Later I may consider either a merge tool
that git can call to resolve conflicts automatically, or change the
metadata storage so that conflicts are very unlikely. But again,
later, right now the important thing is to get something working so I
can get experience with the concepts and feedback on what is actually
working OK and what needs fixing.

I'll make it so distix will make changes and also commit them to git.
This avoids having the user have to do that manually. I'll also make
it so that if there are uncommitted changes, distix will error out.


Templating
==========

Output from distix, to the user, will be produced via templates.
Initially, the templates will be with the distix code, but later I
will add ways for the distix instance admins to use templates from
elsewhere, for extra flexibility and theming.

The use of templates is important. Avoiding to hardcode what is output
and how it's laid out is a good thing from flexibility, and doing it
from the beginning avoids accidentally hardcoding things. The jinj2
templating libary is used.


Ingesting e-mails
=================

Distix is meant to import e-mails and either open new tickets, or put
the e-mails in the right existing tickets. There's two use cases:

1. Ingest an incoming mail directly from the MTA.
2. Ingest a folder of mails that have already been received.

The first form is to be used when distix is subscribed to a mailing
list. The second form is for seeding a new distix instance from
existing emails. There is not much architectural difference between
the two cases.

When ingesting, distix will read in the e-mail, and extract some
metadata from it. With the extracted metadata, distix will find an
existing ticket to which the mail belongs, if one exists. If no
existing ticket is suitable, a new ticket is opened.

Given a ticket id, distix will put the e-mail in the ticket's Maildir.
It will use a SHA1 of the full text of the message (including the
headers, as received) as the filename for the message. If a message
with that filename already exists, distix assumes the new message is a
duplicate and skips it. This allows the user to import the same mail
folder multiple times, adding only new tickets.