muck.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233

[[!meta title="Muck - a JSON store with an HTTP API and access control"]]

**FIXME: This is not really an architecture document yet. Also, there is
only a proof of concept in Python available for now, which is not
meant to be performant, but a vehicle for exploring what the optimal
API and feature set should look like. Feedback on Muck via normal Ick
channels is welcome.**

Muck is a JSON store, with an access controlled RESTful HTTP API. Data
stored in Muck is persistent, but kept in memory for simplicity. Data
is stored as flat JSON objects, which means:

* an object may have any number of fields
* each field has a value that is `null`, a UTF-8 string, or a list of
  UTF-8 strings

Access is granted based on signed JWT bearer tokens. An OpenID Connect
or OAuth2 identity provider (see [[Yuck]]) is expected to give such
tokens to authorized users. The tokens are signed with a public key,
and the expected signing key is a key Muck configuration item. (FIXME:
Muck should probably accept any number of keys, for key rotation and
de-centralisation.)

Access control is currently very simplistic, but will be improved
later. Currently each resource is assigned an owner upon creation, and
each user (subject) can access (see, update, delete) only their own
resources. The goal is to allow access to be specified per user, per
resource, and per operation (Tomjon can allow Verence to see a
specific resource owned by Tomjon, but not update or delete). This
will require the OpenID provider to support groups.

Muck is currently a single-threaded Python program using the Bottle.py
framework and its built-in HTTP server. The production version of Muck
will probably be written in Rust for performance. The current Python
version can do in the order of 900 requests per second on a Thinkpad
X220 laptop (plain HTTP over localhost). The goal is to have the Rust
version be able to do at least 50 thousand such requests per second.

Architecture
-----------------------------------------------------------------------------

Muck is in essence a dict in memory, indexed by resource id, and an
HTTP layer to allow it to be accessed. Any changes are logged to an
append-only `changelog` file. At startup, the `changelog` is read and
the changes are made to the dict. To backup and restore a Muck
instance, or to move it to another host, the `changelog` is enough.

Muck currently does not provide replication, sharding, or scalability
to multiple nodes, or resiliency against its one node having problems
or disappearing. These are valid concerns, which may be addressed
later.

There are currently no index data structures, so searches are slow.

FIXME: Startup can be slow when `changelog` is long. Eventually this
will be fixed by having occasional snapshots of the dict, and only
reading change log entries made after the snapshot.

Configuration and starting and stopping
-----------------------------------------------------------------------------

Create a JSON configuration file:

    {
        "log": "muck.log",
        "pid": "muck.pid",
        "store": "muck.store",
        "signing-key-filename": "trusted-key.pub"
    }

Create the directory given as the store. Put the token-signing public
key in the named file. Start Muck with the following command:

    ./muck_poc config.json

Muck will listen on port 12765 on localhost. If you want to expose
Muck to the external network, you should run a TLS-enabled reverse
proxy (like haproxy or nginx) in front of it.

Muck writes its PID into the named PID file. To stop it, send SIGTERM
or SIGKILL to the process.


HTTP API
-----------------------------------------------------------------------------

The HTTP API requires all requests to have an `Authorization: Bearer
TOKEN` headers, where `TOKEN` is a valid JWT access token whose
signature can be checked using the public key Muck is configured to
trust. The token should have a `scope` claims with space-delimited
words to allow specific operations.

The API has two endpoints: `/res` for resources, `/search` for search.
Resources are managed as follows:

* `POST /res` &mdash; create a new resource (need `create` in scope)
* `PUT /res` &mdash; update an existing resource (need `update` in scope)
* `GET /res` &mdash; retrieve a specific resource (need `show` in scope)
* `DELETE /res` &mdash; delete a specific resource (need `delete` in scope)

In all requests and responses that transport a reosurce, it is in the
body, represented as JSON, using the `application/json` content type.

Resource meta data is always given using HTTP headers of the request
and response:

* `Muck-Id` &mdash; the resource id
* `Muck-Revision` &mdash; the resource revision

The request should have these headers, if the operation requires
them. Responses always have them, if a resource is returned.

FIXME: Since two pieces of metadata accompany each resource, Muck puts
them both in HTTP headers, even if custom for RESTful interfaces is to
put the identifier in the URL path. This may need to be discussed. If
experience shows the approach chosen by Muck to be awkward, it will be
changed.

Searches are done by using a GET request to the `/search` endpoint,
with a JSON body like this:

    {
        "cond": [
            {
                "where": "meta",
                "field": "id",
                "pattern": "ID123",
                "op": "=="
            }
        ]
    }

The search condition is a list of simple conditions, which must all
match. A simple condition consists of four parts:

* `where` &mdash; should be `meta` to match metadata, or `data` to
  match the actual resource
* `field` &mdash; the name of the field to compare
* `pattern` &mdash; the value to compare the field to
* `op` &mdash; the comparison operation: `==`, `>=`, or `<=`

The response is a JSON object listing all the ids of resources that
match all the simple conditions.

Searches require the `show` scope.


API examples
-----------------------------------------------------------------------------

All these examples assume you've already retrieved an access token.

To create a resource:

    POST /res HTTP/1.1
    Authorization: Bearer TOKEN
    Content-Type: application/json

    {"foo": "bar"}

Response is:

    201 Created
    Content-Type: application/json
    Muck-Id: ID
    Muck-Revision: REV1
    
    {"foo": "bar"}

Note that in the future Muck might decide to modify the resource by
filling in missing fields. The canonical representation of the
resource is in the response.

To update a resource:

    PUT /res HTTP/1.1
    Authorization: Bearer TOKEN
    Content-Type: application/json
    Muck-Id: ID
    Muck-Revision: REV1

    {"foo": "yo"}

The response:

    200 OK
    Content-Type: application/json
    Muck-Id: ID
    Muck-Revision: REV2
    
    {"foo": "yo"}

To retrieve a response:

    GET /res HTTP/1.1
    Authorization: Bearer TOKEN
    Muck-Id: ID

The response:

    200 OK
    Content-Type: application/json
    Muck-Id: ID
    Muck-Revision: REV2
    
    {"foo": "yo"}

To delete a resource:

    DELETE /res HTTP/1.1
    Authorization: Bearer TOKEN
    Muck-Id: ID

The response:

    200 OK

To search:

    GET /search HTTP/1.1
    Authorization: Bearer TOKEN
    Content-Type: application/json

    {"cond": [
        {"where":"data", "field":"name", "pattern":"James", "op":">="}
    ]}

The response:

    200 OK
    Content-Type: application/json
    
    {"resources": ["ID"]}