muck-poc - a JSON store with an HTTP API and access control ============================================================================= > This is a proof of concept. It's not meant to be performant. It's a > vehicle for exploring what the optimal API and feature set should > look like. Muck is a JSON store, with an access controlled RESTful HTTP API. Data stored in Muck is persistent, but kept in memory for simplicity. Data is stored as flat JSON objects, which means: * an object may have any number of fields * each field has a value that is `null`, a UTF-8 string, or a list of UTF-8 strings Access is granted based on signed JWT bearer tokens. An OpenID Connect or OAuth2 identity provider is expected to give such tokens to authorized users. The tokens are signed with a public key, and the expected signing key is a key Muck configuration item. I use Qvisqve for my OpenID provider, but any provider should work. Access control is currently very simplistic, but will be improved later. Currently each resource is assigned an owner upon creation, and each user (subject) can access (see, update, delete) only their own resources. The goal is to allow access to be specified per user, per resource, and per operation (Tomjon can allow Verence to see a specific resource, but not update or delete). This will require the OpenID provider to support groups. Muck is currently a single-threaded Python program using the Bottle.py framework and its built-in HTTP server. The production version of Muck will probably be written in Rust for performance. The current Python version can do in the order of 900 requests per second on a Thinkpad X220 laptop (plain HTTP over localhost). The goal is to have the Rust version be able to do at least 50 thousand requests per second. Comments? Feedback? Bug reports? Patches? ----------------------------------------------------------------------------- If you have any comments or other feedback, please send them to liw@liw.fi. Muck will eventually become part of the ick project (https://ick.liw.fi/), and when it does, the usual ick communication channels will be appropriate. Ick will use Muck to add some protection against concurrent changes, and to prevent users from seeing each others' data. It seems better to do these in Muck than in the Ick controller directly. I hope Muck will come to be useful for others as well. If you think it might be useful for you, but have reservations or use cases, please get in touch. Architecture ----------------------------------------------------------------------------- Muck is in essence a dict in memory, indexed by resource id, and an HTTP layer to allow it to be accessed. Any changes are logged to an append-only `changelog` file. At startup, the `changelog` is read and the changes are made to the dict. To backup and restore a Muck instance, or to move it to another host, the `changelog` is enough. Muck currently does not provide replication, sharding, or scalability to multiple nodes, or resiliency against its one node having problems or disappearing. These are valid concerns, which may be addressed later. There are currently no index data structures, so searches are very slow. Startup can be slow if `changelog` is long. Eventually this will be fixed by having occasional snapshots of the dict, and only reading change log entries made after the snapshot. Hacking ----------------------------------------------------------------------------- Run `./check` to run the full test suite: unit tests, and integration tests. You'll need various build dependencies. I'm too lazy to list them here. Run `./benchmark` and `./benchmark-http` to run some simplistic benchmarking. The tests and benchmarks create access tokens using pre-generated test keys. If you use those keys for anything else, I will laugh at you. Configuration and starting and stopping ----------------------------------------------------------------------------- Create a JSON configuration file: { "log": "muck.log", "pid": "muck.pid", "store": "muck.store", "signing-key-filename": "trusted-key.pub" } Create the directory given as the store. Put the token-signing public key in the named file. Start Muck with the following command: ./muck_poc config.json Muck will listen on port 12765 on localhost. If you want to expose Muck to the external network, you should run a TLS-enabled reverse proxy (like haproxy or nginx) in front of it. Muck writes its PID into the named PID file. To stop it, send SIGTERM or SIGKILL to the process. HTTP API ----------------------------------------------------------------------------- The HTTP API requires all requests to have an `Authorization: Bearer TOKEN` headers, where `TOKEN` is a valid JWT access token whose signature can be checked using the public key Muck is configured to trust. The token should have a `scope` claims with space-delimited words to allow specific operations. The API has two endpoints: `/res` for resources, `/search` for search. Resources are managed as follows: * `POST /res` — create a new resource (need `create` in scope) * `PUT /res` — update an existing resource (need `update` in scope) * `GET /res` — retrieve a specific resource (need `show` in scope) * `DELETE /res` — delete a specific resource (need `delete` in scope) In all requests and responses that transport a reosurce, it is in the body, represented as JSON, using the `application/json` content type. Resource meta data is always given using HTTP headers of the request and response: * `Muck-Id` — the resource id * `Muck-Revision` — the resource revision The request should have these headers, if the operation requires them. Responses always have them, if a resource is returned. Searches are done by using a GET request to the `/search` endpoint, with a JSON body like this: { "cond": [ { "where": "meta", "field": "id", "pattern": "ID123", "op": "==" } ] } The search condition is a list of simple conditions, which must all match. A simple condition consists of four parts: * `where` — should be `meta` to match metadata, or `data` to match the actual resource * `field` — the name of the field to compare * `pattern` — the value to compare the field to * `op` — the comparison operation: `==`, `>=`, or `<=` The response is a JSON object listing all the ids of resources that match all the simple conditions. Searches require the `show` scope. API examples ----------------------------------------------------------------------------- All these examples assume you've already retrieved an access token. To create a resource: POST /res HTTP/1.1 Authorization: Bearer TOKEN Content-Type: application/json {"foo": "bar"} Response is: 201 Created Content-Type: application/json Muck-Id: ID Muck-Revision: REV1 {"foo": "bar"} Note that in the future Muck might decide to modify the resource by filling in missing fields. The canonical representation of the resource is in the response. To update a resource: PUT /res HTTP/1.1 Authorization: Bearer TOKEN Content-Type: application/json Muck-Id: ID Muck-Revision: REV1 {"foo": "yo"} The response: 200 OK Content-Type: application/json Muck-Id: ID Muck-Revision: REV2 {"foo": "yo"} To retrieve a response: GET /res HTTP/1.1 Authorization: Bearer TOKEN Muck-Id: ID The response: 200 OK Content-Type: application/json Muck-Id: ID Muck-Revision: REV2 {"foo": "yo"} To delete a resource: DELETE /res HTTP/1.1 Authorization: Bearer TOKEN Muck-Id: ID The response: 200 OK To search: GET /search HTTP/1.1 Authorization: Bearer TOKEN Content-Type: application/json {"cond": [ {"where":"data", "field":"name", "pattern":"James", "op":">="} ]} The response: 200 OK Content-Type: application/json {"resources": ["ID"]} Legalese ----------------------------------------------------------------------------- Muck is licensed under the AGPL3+ license, a copy of which is included as `COPYING` in the source code of this program. This license does NOT apply to clients of the HTTP API it provides. Copyright 2018 Lars Wirzenius This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see .