README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278

muck-poc - a JSON store with an HTTP API and access control
=============================================================================

> This is a proof of concept. It's not meant to be performant. It's a
> vehicle for exploring what the optimal API and feature set should
> look like.

Muck is a JSON store, with an access controlled RESTful HTTP API. Data
stored in Muck is persistent, but kept in memory for simplicity. Data
is stored as flat JSON objects, which means:

* an object may have any number of fields
* each field has a value that is `null`, a UTF-8 string, or a list of
  UTF-8 strings

Access is granted based on signed JWT bearer tokens. An OpenID Connect
or OAuth2 identity provider is expected to give such tokens to
authorized users. The tokens are signed with a public key, and the
expected signing key is a key Muck configuration item. I use Qvisqve
for my OpenID provider, but any provider should work.

Access control is currently very simplistic, but will be improved
later. Currently each resource is assigned an owner upon creation, and
each user (subject) can access (see, update, delete) only their own
resources. The goal is to allow access to be specified per user, per
resource, and per operation (Tomjon can allow Verence to see a
specific resource, but not update or delete). This will require the
OpenID provider to support groups.

Muck is currently a single-threaded Python program using the Bottle.py
framework and its built-in HTTP server. The production version of Muck
will probably be written in Rust for performance. The current Python
version can do in the order of 900 requests per second on a Thinkpad
X220 laptop (plain HTTP over localhost). The goal is to have the Rust
version be able to do at least 50 thousand requests per second.

Comments? Feedback? Bug reports? Patches?
-----------------------------------------------------------------------------

If you have any comments or other feedback, please send them to
liw@liw.fi.

Muck will eventually become part of the ick project
(https://ick.liw.fi/), and when it does, the usual ick communication
channels will be appropriate. Ick will use Muck to add some protection
against concurrent changes, and to prevent users from seeing each
others' data. It seems better to do these in Muck than in the Ick
controller directly.

I hope Muck will come to be useful for others as well. If you think it
might be useful for you, but have reservations or use cases, please
get in touch.

Architecture
-----------------------------------------------------------------------------

Muck is in essence a dict in memory, indexed by resource id, and an
HTTP layer to allow it to be accessed. Any changes are logged to an
append-only `changelog` file. At startup, the `changelog` is read and
the changes are made to the dict. To backup and restore a Muck
instance, or to move it to another host, the `changelog` is enough.

Muck currently does not provide replication, sharding, or scalability
to multiple nodes, or resiliency against its one node having problems
or disappearing. These are valid concerns, which may be addressed
later.

There are currently no index data structures, so searches are very
slow.

Startup can be slow if `changelog` is long. Eventually this will be
fixed by having occasional snapshots of the dict, and only reading
change log entries made after the snapshot.

Hacking
-----------------------------------------------------------------------------

Run `./check` to run the full test suite: unit tests, and integration
tests. You'll need various build dependencies. I'm too lazy to list
them here.

Run `./benchmark` and `./benchmark-http` to run some simplistic
benchmarking.

The tests and benchmarks create access tokens using pre-generated test
keys. If you use those keys for anything else, I will laugh at you.

Configuration and starting and stopping
-----------------------------------------------------------------------------

Create a JSON configuration file:

    {
        "log": "muck.log",
        "pid": "muck.pid",
        "store": "muck.store",
        "signing-key-filename": "trusted-key.pub"
    }

Create the directory given as the store. Put the token-signing public
key in the named file. Start Muck with the following command:

    ./muck_poc config.json

Muck will listen on port 12765 on localhost. If you want to expose
Muck to the external network, you should run a TLS-enabled reverse
proxy (like haproxy or nginx) in front of it.

Muck writes its PID into the named PID file. To stop it, send SIGTERM
or SIGKILL to the process.


HTTP API
-----------------------------------------------------------------------------

The HTTP API requires all requests to have an `Authorization: Bearer
TOKEN` headers, where `TOKEN` is a valid JWT access token whose
signature can be checked using the public key Muck is configured to
trust. The token should have a `scope` claims with space-delimited
words to allow specific operations.

The API has two endpoints: `/res` for resources, `/search` for search.
Resources are managed as follows:

* `POST /res` &mdash; create a new resource (need `create` in scope)
* `PUT /res` &mdash; update an existing resource (need `update` in scope)
* `GET /res` &mdash; retrieve a specific resource (need `show` in scope)
* `DELETE /res` &mdash; delete a specific resource (need `delete` in scope)

In all requests and responses that transport a reosurce, it is in the
body, represented as JSON, using the `application/json` content type.

Resource meta data is always given using HTTP headers of the request
and response:

* `Muck-Id` &mdash; the resource id
* `Muck-Revision` &mdash; the resource revision

The request should have these headers, if the operation requires
them. Responses always have them, if a resource is returned.

Searches are done by using a GET request to the `/search` endpoint,
with a JSON body like this:

    {
        "cond": [
            {
                "where": "meta",
                "field": "id",
                "pattern": "ID123",
                "op": "=="
            }
        ]
    }

The search condition is a list of simple conditions, which must all
match. A simple condition consists of four parts:

* `where` &mdash; should be `meta` to match metadata, or `data` to
  match the actual resource
* `field` &mdash; the name of the field to compare
* `pattern` &mdash; the value to compare the field to
* `op` &mdash; the comparison operation: `==`, `>=`, or `<=`

The response is a JSON object listing all the ids of resources that
match all the simple conditions.

Searches require the `show` scope.


API examples
-----------------------------------------------------------------------------

All these examples assume you've already retrieved an access token.

To create a resource:

    POST /res HTTP/1.1
    Authorization: Bearer TOKEN
    Content-Type: application/json

    {"foo": "bar"}

Response is:

    201 Created
    Content-Type: application/json
    Muck-Id: ID
    Muck-Revision: REV1
    
    {"foo": "bar"}

Note that in the future Muck might decide to modify the resource by
filling in missing fields. The canonical representation of the
resource is in the response.

To update a resource:

    PUT /res HTTP/1.1
    Authorization: Bearer TOKEN
    Content-Type: application/json
    Muck-Id: ID
    Muck-Revision: REV1

    {"foo": "yo"}

The response:

    200 OK
    Content-Type: application/json
    Muck-Id: ID
    Muck-Revision: REV2
    
    {"foo": "yo"}

To retrieve a response:

    GET /res HTTP/1.1
    Authorization: Bearer TOKEN
    Muck-Id: ID

The response:

    200 OK
    Content-Type: application/json
    Muck-Id: ID
    Muck-Revision: REV2
    
    {"foo": "yo"}

To delete a resource:

    DELETE /res HTTP/1.1
    Authorization: Bearer TOKEN
    Muck-Id: ID

The response:

    200 OK

To search:

    GET /search HTTP/1.1
    Authorization: Bearer TOKEN
    Content-Type: application/json

    {"cond": [
        {"where":"data", "field":"name", "pattern":"James", "op":">="}
    ]}

The response:

    200 OK
    Content-Type: application/json
    
    {"resources": ["ID"]}

Legalese
-----------------------------------------------------------------------------

Muck is licensed under the AGPL3+ license, a copy of which is included
as `COPYING` in the source code of this program. This license does NOT
apply to clients of the HTTP API it provides.

Copyright 2018  Lars Wirzenius

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.