From b3d6dd442ef550f717499653f78c80ad521df48a Mon Sep 17 00:00:00 2001
From: Lars Wirzenius <liw@liw.fi>
Date: Sat, 26 Jan 2019 09:45:03 +0200
Subject: Add: a page for Muck

---
 muck.mdwn | 234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 234 insertions(+)
 create mode 100644 muck.mdwn

diff --git a/muck.mdwn b/muck.mdwn
new file mode 100644
index 0000000..dee8da9
--- /dev/null
+++ b/muck.mdwn
@@ -0,0 +1,234 @@
+[[!meta title="Muck - a JSON store with an HTTP API and access control"
+=============================================================================
+
+**FIXME: This is not really an architecture document yet. Also, there is
+only a proof of concept in Python available for now, which is not
+meant to be performant, but a vehicle for exploring what the optimal
+API and feature set should look like. Feedback on Muck via normal Ick
+channels is welcome.**
+
+Muck is a JSON store, with an access controlled RESTful HTTP API. Data
+stored in Muck is persistent, but kept in memory for simplicity. Data
+is stored as flat JSON objects, which means:
+
+* an object may have any number of fields
+* each field has a value that is `null`, a UTF-8 string, or a list of
+  UTF-8 strings
+
+Access is granted based on signed JWT bearer tokens. An OpenID Connect
+or OAuth2 identity provider (see [[Yuck]]) is expected to give such
+tokens to authorized users. The tokens are signed with a public key,
+and the expected signing key is a key Muck configuration item. (FIXME:
+Muck should probably accept any number of keys, for key rotation and
+de-centralisation.)
+
+Access control is currently very simplistic, but will be improved
+later. Currently each resource is assigned an owner upon creation, and
+each user (subject) can access (see, update, delete) only their own
+resources. The goal is to allow access to be specified per user, per
+resource, and per operation (Tomjon can allow Verence to see a
+specific resource owned by Tomjon, but not update or delete). This
+will require the OpenID provider to support groups.
+
+Muck is currently a single-threaded Python program using the Bottle.py
+framework and its built-in HTTP server. The production version of Muck
+will probably be written in Rust for performance. The current Python
+version can do in the order of 900 requests per second on a Thinkpad
+X220 laptop (plain HTTP over localhost). The goal is to have the Rust
+version be able to do at least 50 thousand such requests per second.
+
+Architecture
+-----------------------------------------------------------------------------
+
+Muck is in essence a dict in memory, indexed by resource id, and an
+HTTP layer to allow it to be accessed. Any changes are logged to an
+append-only `changelog` file. At startup, the `changelog` is read and
+the changes are made to the dict. To backup and restore a Muck
+instance, or to move it to another host, the `changelog` is enough.
+
+Muck currently does not provide replication, sharding, or scalability
+to multiple nodes, or resiliency against its one node having problems
+or disappearing. These are valid concerns, which may be addressed
+later.
+
+There are currently no index data structures, so searches are slow.
+
+FIXME: Startup can be slow when `changelog` is long. Eventually this
+will be fixed by having occasional snapshots of the dict, and only
+reading change log entries made after the snapshot.
+
+Configuration and starting and stopping
+-----------------------------------------------------------------------------
+
+Create a JSON configuration file:
+
+    {
+        "log": "muck.log",
+        "pid": "muck.pid",
+        "store": "muck.store",
+        "signing-key-filename": "trusted-key.pub"
+    }
+
+Create the directory given as the store. Put the token-signing public
+key in the named file. Start Muck with the following command:
+
+    ./muck_poc config.json
+
+Muck will listen on port 12765 on localhost. If you want to expose
+Muck to the external network, you should run a TLS-enabled reverse
+proxy (like haproxy or nginx) in front of it.
+
+Muck writes its PID into the named PID file. To stop it, send SIGTERM
+or SIGKILL to the process.
+
+
+HTTP API
+-----------------------------------------------------------------------------
+
+The HTTP API requires all requests to have an `Authorization: Bearer
+TOKEN` headers, where `TOKEN` is a valid JWT access token whose
+signature can be checked using the public key Muck is configured to
+trust. The token should have a `scope` claims with space-delimited
+words to allow specific operations.
+
+The API has two endpoints: `/res` for resources, `/search` for search.
+Resources are managed as follows:
+
+* `POST /res` &mdash; create a new resource (need `create` in scope)
+* `PUT /res` &mdash; update an existing resource (need `update` in scope)
+* `GET /res` &mdash; retrieve a specific resource (need `show` in scope)
+* `DELETE /res` &mdash; delete a specific resource (need `delete` in scope)
+
+In all requests and responses that transport a reosurce, it is in the
+body, represented as JSON, using the `application/json` content type.
+
+Resource meta data is always given using HTTP headers of the request
+and response:
+
+* `Muck-Id` &mdash; the resource id
+* `Muck-Revision` &mdash; the resource revision
+
+The request should have these headers, if the operation requires
+them. Responses always have them, if a resource is returned.
+
+FIXME: Since two pieces of metadata accompany each resource, Muck puts
+them both in HTTP headers, even if custom for RESTful interfaces is to
+put the identifier in the URL path. This may need to be discussed. If
+experience shows the approach chosen by Muck to be awkward, it will be
+changed.
+
+Searches are done by using a GET request to the `/search` endpoint,
+with a JSON body like this:
+
+    {
+        "cond": [
+            {
+                "where": "meta",
+                "field": "id",
+                "pattern": "ID123",
+                "op": "=="
+            }
+        ]
+    }
+
+The search condition is a list of simple conditions, which must all
+match. A simple condition consists of four parts:
+
+* `where` &mdash; should be `meta` to match metadata, or `data` to
+  match the actual resource
+* `field` &mdash; the name of the field to compare
+* `pattern` &mdash; the value to compare the field to
+* `op` &mdash; the comparison operation: `==`, `>=`, or `<=`
+
+The response is a JSON object listing all the ids of resources that
+match all the simple conditions.
+
+Searches require the `show` scope.
+
+
+API examples
+-----------------------------------------------------------------------------
+
+All these examples assume you've already retrieved an access token.
+
+To create a resource:
+
+    POST /res HTTP/1.1
+    Authorization: Bearer TOKEN
+    Content-Type: application/json
+
+    {"foo": "bar"}
+
+Response is:
+
+    201 Created
+    Content-Type: application/json
+    Muck-Id: ID
+    Muck-Revision: REV1
+    
+    {"foo": "bar"}
+
+Note that in the future Muck might decide to modify the resource by
+filling in missing fields. The canonical representation of the
+resource is in the response.
+
+To update a resource:
+
+    PUT /res HTTP/1.1
+    Authorization: Bearer TOKEN
+    Content-Type: application/json
+    Muck-Id: ID
+    Muck-Revision: REV1
+
+    {"foo": "yo"}
+
+The response:
+
+    200 OK
+    Content-Type: application/json
+    Muck-Id: ID
+    Muck-Revision: REV2
+    
+    {"foo": "yo"}
+
+To retrieve a response:
+
+    GET /res HTTP/1.1
+    Authorization: Bearer TOKEN
+    Muck-Id: ID
+
+The response:
+
+    200 OK
+    Content-Type: application/json
+    Muck-Id: ID
+    Muck-Revision: REV2
+    
+    {"foo": "yo"}
+
+To delete a resource:
+
+    DELETE /res HTTP/1.1
+    Authorization: Bearer TOKEN
+    Muck-Id: ID
+
+The response:
+
+    200 OK
+
+To search:
+
+    GET /search HTTP/1.1
+    Authorization: Bearer TOKEN
+    Content-Type: application/json
+
+    {"cond": [
+        {"where":"data", "field":"name", "pattern":"James", "op":">="}
+    ]}
+
+The response:
+
+    200 OK
+    Content-Type: application/json
+    
+    {"resources": ["ID"]}
-- 
cgit v1.2.1