% Architecture of distix, a ticketing system % Lars Wirzenius Introduction ============ Distix is a distributed ticketing system. It is currently a work-in-progress. This document explains the general architecture of distix, particularly the way it stores data. The big picture =============== The general approach is to store data as files in git. There's two kinds of files: metadata about a ticket, and the discussion about each ticket. Messages in the discussion are stores as e-mails in a Maildir. Metadata is stored as a YAML file and is structured as key/value pairs. There can be multiple values for one key, but each value must be unique. I chose git as the way to store data, since it is already good at synchronising distributed data stores, and merging changes from multiple sources. The distix data is carefully structured to make it easy for git to merge changes: e-mails never change, and key/value lists are easy to merge. On discussions about tickets ============================ I believe, very firmly, that e-mail is a good way to discuss problems such as those that are relevant to a ticketing systems. There are other good ways, of course, but e-mail is the common base that everyone has, and that works quite well. distix thus stores discussions as e-mails, and it stores the actual e-mails rather than trying to extract what I think right now is relevant from each e-mail. Ideally, I want to have a mailing list for supporting users of some piece of free software, and everyone e-mails the list when they need help. Distix will be subscribed to the list, and nobody else needs to know about it, unless they want to. Distix won't send any e-mails. Everyone cc's the mailing list on all mails, and distix will ingest them and automatically open new tickets for new discussion threads, and assign each incoming e-mail to existing tickets if appropriate. The support staff then uses distix to keep track of a large number of concurrent discussions. They'll mark tickets as closed when the issue seems to have been dealt with, or they mark them as describing actual bugs that need fixing, or whatever is appropriate. distix architecture =================== distix stores data as files in git. Each git repository holds data for one ticketing system instance only. There is some metadata about the repository (ticketing system instance), and then a number of tickets. Each ticket consists of ticket metadata and a Maildir with all the discussion about the ticket. Maildir is a good way to store mails. Each mail gets a unique id, and this should make them easily mergeable with git. The metadata is key/value pairs. The key will be a string, and the value will also be a string. Keys may occur multiple times: in effect, the value is then a non-empty list of unique strings. This multiplicity is important so that users may, for example, do things like having a key for which supported branches a bug is affected by: found-in-branch=supported/1.1 found-in-branch=supported/1.2 The key/value pairs are stored in YAML files. JSON would also be a nice choice, except it sucks for human editability, and doesn't, especially, deal well, from a usability point of view, with string values that contain many lines of text. While I don't expect, long-term, for distix users to edit the metadata files directly, I will have to, and so I care about the file format. I don't want to invent a new format myself. That's silly, and I've done it a few dozen times already, so I'm bored by it. Thus a metadata file will look like this: key: - value - value 2 key2: - value - value 3 I'll worry about namespacing keys later. I don't need to have the perfect format from start, and I'm not yet sure namespacing will be worth it. Right now, I care about having something simple that works. I'll worry about perfect later. The repository has a layout like this: repo.yaml -- repository metadata tickets/ TICKETID/ ticket.yaml -- ticket metadata Maildir/ I may later decide to have the TICKETID level be multiple levels, based on the ticket id, to avoid having too many tickets in one directory. But later, this is fine for now. I store the ticket id in the ticket's metadata. This allows us to rename directories and also simplifies the code that needs to know a ticket. If I decide to implement an "archive" feature, where tickets that are moved after they are well and truly done, and nobody cares about them anymore, and nobody calls, and nobody visits them, but they shouldn't be deleted either, then I'll add a top level "archive" directory. But we'll see, that's an idea for the future. I may add more metadata files, too, such as one for stored searches, or such, but, again, we'll see. I'll let git worry about merging. This opens up the possibility of merge conflicts. For now, I'll punt on that, and let the user deal with the conflicts manually. Later I may consider either a merge tool that git can call to resolve conflicts automatically, or change the metadata storage so that conflicts are very unlikely. But again, later, right now the important thing is to get something working so I can get experience with the concepts and feedback on what is actually working OK and what needs fixing. I'll make it so distix will make changes and also commit them to git. This avoids having the user have to do that manually. I'll also make it so that if there are uncommitted changes, distix will error out. Templating ========== Output from distix, to the user, will be produced via templates. Initially, the templates will be with the distix code, but later I will add ways for the distix instance admins to use templates from elsewhere, for extra flexibility and theming. The use of templates is important. Avoiding to hardcode what is output and how it's laid out is a good thing from flexibility, and doing it from the beginning avoids accidentally hardcoding things. The jinj2 templating libary is used. Ingesting e-mails ================= Distix is meant to import e-mails and either open new tickets, or put the e-mails in the right existing tickets. There's two use cases: 1. Ingest an incoming mail directly from the MTA. 2. Ingest a folder of mails that have already been received. The first form is to be used when distix is subscribed to a mailing list. The second form is for seeding a new distix instance from existing emails. There is not much architectural difference between the two cases. When ingesting, distix will read in the e-mail, and extract some metadata from it. With the extracted metadata, distix will find an existing ticket to which the mail belongs, if one exists. If no existing ticket is suitable, a new ticket is opened. Given a ticket id, distix will put the e-mail in the ticket's Maildir. It will use a SHA1 of the full text of the message (including the headers, as received) as the filename for the message. If a message with that filename already exists, distix assumes the new message is a duplicate and skips it. This allows the user to import the same mail folder multiple times, adding only new tickets.