From b3d6dd442ef550f717499653f78c80ad521df48a Mon Sep 17 00:00:00 2001 From: Lars Wirzenius Date: Sat, 26 Jan 2019 09:45:03 +0200 Subject: Add: a page for Muck --- muck.mdwn | 234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 234 insertions(+) create mode 100644 muck.mdwn diff --git a/muck.mdwn b/muck.mdwn new file mode 100644 index 0000000..dee8da9 --- /dev/null +++ b/muck.mdwn @@ -0,0 +1,234 @@ +[[!meta title="Muck - a JSON store with an HTTP API and access control" +============================================================================= + +**FIXME: This is not really an architecture document yet. Also, there is +only a proof of concept in Python available for now, which is not +meant to be performant, but a vehicle for exploring what the optimal +API and feature set should look like. Feedback on Muck via normal Ick +channels is welcome.** + +Muck is a JSON store, with an access controlled RESTful HTTP API. Data +stored in Muck is persistent, but kept in memory for simplicity. Data +is stored as flat JSON objects, which means: + +* an object may have any number of fields +* each field has a value that is `null`, a UTF-8 string, or a list of + UTF-8 strings + +Access is granted based on signed JWT bearer tokens. An OpenID Connect +or OAuth2 identity provider (see [[Yuck]]) is expected to give such +tokens to authorized users. The tokens are signed with a public key, +and the expected signing key is a key Muck configuration item. (FIXME: +Muck should probably accept any number of keys, for key rotation and +de-centralisation.) + +Access control is currently very simplistic, but will be improved +later. Currently each resource is assigned an owner upon creation, and +each user (subject) can access (see, update, delete) only their own +resources. The goal is to allow access to be specified per user, per +resource, and per operation (Tomjon can allow Verence to see a +specific resource owned by Tomjon, but not update or delete). This +will require the OpenID provider to support groups. + +Muck is currently a single-threaded Python program using the Bottle.py +framework and its built-in HTTP server. The production version of Muck +will probably be written in Rust for performance. The current Python +version can do in the order of 900 requests per second on a Thinkpad +X220 laptop (plain HTTP over localhost). The goal is to have the Rust +version be able to do at least 50 thousand such requests per second. + +Architecture +----------------------------------------------------------------------------- + +Muck is in essence a dict in memory, indexed by resource id, and an +HTTP layer to allow it to be accessed. Any changes are logged to an +append-only `changelog` file. At startup, the `changelog` is read and +the changes are made to the dict. To backup and restore a Muck +instance, or to move it to another host, the `changelog` is enough. + +Muck currently does not provide replication, sharding, or scalability +to multiple nodes, or resiliency against its one node having problems +or disappearing. These are valid concerns, which may be addressed +later. + +There are currently no index data structures, so searches are slow. + +FIXME: Startup can be slow when `changelog` is long. Eventually this +will be fixed by having occasional snapshots of the dict, and only +reading change log entries made after the snapshot. + +Configuration and starting and stopping +----------------------------------------------------------------------------- + +Create a JSON configuration file: + + { + "log": "muck.log", + "pid": "muck.pid", + "store": "muck.store", + "signing-key-filename": "trusted-key.pub" + } + +Create the directory given as the store. Put the token-signing public +key in the named file. Start Muck with the following command: + + ./muck_poc config.json + +Muck will listen on port 12765 on localhost. If you want to expose +Muck to the external network, you should run a TLS-enabled reverse +proxy (like haproxy or nginx) in front of it. + +Muck writes its PID into the named PID file. To stop it, send SIGTERM +or SIGKILL to the process. + + +HTTP API +----------------------------------------------------------------------------- + +The HTTP API requires all requests to have an `Authorization: Bearer +TOKEN` headers, where `TOKEN` is a valid JWT access token whose +signature can be checked using the public key Muck is configured to +trust. The token should have a `scope` claims with space-delimited +words to allow specific operations. + +The API has two endpoints: `/res` for resources, `/search` for search. +Resources are managed as follows: + +* `POST /res` — create a new resource (need `create` in scope) +* `PUT /res` — update an existing resource (need `update` in scope) +* `GET /res` — retrieve a specific resource (need `show` in scope) +* `DELETE /res` — delete a specific resource (need `delete` in scope) + +In all requests and responses that transport a reosurce, it is in the +body, represented as JSON, using the `application/json` content type. + +Resource meta data is always given using HTTP headers of the request +and response: + +* `Muck-Id` — the resource id +* `Muck-Revision` — the resource revision + +The request should have these headers, if the operation requires +them. Responses always have them, if a resource is returned. + +FIXME: Since two pieces of metadata accompany each resource, Muck puts +them both in HTTP headers, even if custom for RESTful interfaces is to +put the identifier in the URL path. This may need to be discussed. If +experience shows the approach chosen by Muck to be awkward, it will be +changed. + +Searches are done by using a GET request to the `/search` endpoint, +with a JSON body like this: + + { + "cond": [ + { + "where": "meta", + "field": "id", + "pattern": "ID123", + "op": "==" + } + ] + } + +The search condition is a list of simple conditions, which must all +match. A simple condition consists of four parts: + +* `where` — should be `meta` to match metadata, or `data` to + match the actual resource +* `field` — the name of the field to compare +* `pattern` — the value to compare the field to +* `op` — the comparison operation: `==`, `>=`, or `<=` + +The response is a JSON object listing all the ids of resources that +match all the simple conditions. + +Searches require the `show` scope. + + +API examples +----------------------------------------------------------------------------- + +All these examples assume you've already retrieved an access token. + +To create a resource: + + POST /res HTTP/1.1 + Authorization: Bearer TOKEN + Content-Type: application/json + + {"foo": "bar"} + +Response is: + + 201 Created + Content-Type: application/json + Muck-Id: ID + Muck-Revision: REV1 + + {"foo": "bar"} + +Note that in the future Muck might decide to modify the resource by +filling in missing fields. The canonical representation of the +resource is in the response. + +To update a resource: + + PUT /res HTTP/1.1 + Authorization: Bearer TOKEN + Content-Type: application/json + Muck-Id: ID + Muck-Revision: REV1 + + {"foo": "yo"} + +The response: + + 200 OK + Content-Type: application/json + Muck-Id: ID + Muck-Revision: REV2 + + {"foo": "yo"} + +To retrieve a response: + + GET /res HTTP/1.1 + Authorization: Bearer TOKEN + Muck-Id: ID + +The response: + + 200 OK + Content-Type: application/json + Muck-Id: ID + Muck-Revision: REV2 + + {"foo": "yo"} + +To delete a resource: + + DELETE /res HTTP/1.1 + Authorization: Bearer TOKEN + Muck-Id: ID + +The response: + + 200 OK + +To search: + + GET /search HTTP/1.1 + Authorization: Bearer TOKEN + Content-Type: application/json + + {"cond": [ + {"where":"data", "field":"name", "pattern":"James", "op":">="} + ]} + +The response: + + 200 OK + Content-Type: application/json + + {"resources": ["ID"]} -- cgit v1.2.1