summaryrefslogtreecommitdiff
path: root/examples/muck/muck.md
diff options
context:
space:
mode:
Diffstat (limited to 'examples/muck/muck.md')
-rw-r--r--examples/muck/muck.md513
1 files changed, 513 insertions, 0 deletions
diff --git a/examples/muck/muck.md b/examples/muck/muck.md
new file mode 100644
index 0000000..d5470a0
--- /dev/null
+++ b/examples/muck/muck.md
@@ -0,0 +1,513 @@
+---
+title: Muck JSON storage server and API
+author: Lars Wirzenius
+date: work in progress
+bindings: muck.yaml
+functions: muck.py
+template: python
+...
+
+Introduction
+=============================================================================
+
+Muck is intended for storing relatively small pieces of data securely,
+and accessing them quickly. Intended uses cases are:
+
+* storing user, client, application, and related data for an OpenID
+ Connect authenatication server
+* storing personally identifiable information of data subjects (in the
+ GDPR sense) in a way that they can access and update, assuming
+ integration with a suitable authantication and authorization server
+* in general, storage for web applications of data that isn't large
+ and fits easily into RAM
+
+Muck is a JSON store, with an access controlled RESTful HTTP API. Data
+stored in Muck is persistent, but kept in memory for fast access. Data
+is represented as JSON objects.
+
+Access is granted based on signed JWT bearer tokens. An OpenID Connect
+or OAuth2 identity provider is expected to give such tokens to Muck
+clients. The tokens must be signed with a public key that Muck is
+configured to accept.
+
+Access control is simplistic. Each resource is assigned an owner
+upon creation, and each user can access (see, update, delete) only
+their own resources. A use with "super" powers can access, update, and
+delete resources they don't own, but can't create resources for other.
+This will be improved later.
+
+Architecture
+-----------------------------------------------------------------------------
+
+Muck stores data persistently in its local file system. It provides an
+HTTP API for clients. Muck itself does not communicate otherwise with
+external entities.
+
+```dot
+digraph "architecture" {
+muck [shape=box label="Muck"];
+storage [shape=tab label="Persistent \n storage"];
+client [shape=ellipse label="API client"];
+idp [shape=ellipse label="OAuth2/OIDC server"];
+
+storage -> muck [label="Read at \n startup"];
+muck -> storage [label="Write \n changes"];
+client -> muck [label="API read/write \n (HTTP)"];
+client -> idp [label="Get access token"];
+idp -> muck [label="Token signing key"];
+}
+```
+
+
+Authentication
+-----------------------------------------------------------------------------
+
+[OAuth2]: https://oauth.net/
+[OpenID Connect]: https://openid.net/connect/
+[JWT]: https://en.wikipedia.org/wiki/JSON_Web_Token
+
+Muck uses [OAuth2][] or [OpenID Connect][] bearer tokens as access
+tokens. The tokens are granted by some form of authentication service,
+are [JWT][] tokens, and signed using public-key cryptography. The
+authentication service is outside the scope of this document; any
+standard implementation should work.
+
+Muck will be configured with one public key for validating the tokens.
+For Muck to access a token:
+
+* its signature must be valid according to the public key
+* it to must be used while it's valid (after the validity starts, but
+ before if expires)
+* its audience must be the specific Muck instance
+* its scope claim contains the specified scopes needed for the
+ attempted operation
+* it specified an end-user (data subject)
+
+Every request to the Muck API must include a token, in the
+`Authorizatin` header as a bearer token. The request is denied if the
+token does not pass all the above checks.
+
+Requirements
+=============================================================================
+
+This chapter lists high level requirements for Muck.
+
+Each requirement here is given a unique mnemnoic id for easier
+reference in discussions.
+
+**SimpleOps**
+
+: Muck must be simple to install and operate. Installation should be
+ installing a .deb package, configuration by setting the public key
+ for token signing of the authentication server.
+
+**Fast**
+
+: Muck must be fast. The speed requirement is that Muck must be able
+ to handle at least 100 concurrent clients, creating 1000 objects
+ each, and then retrieving each object, and then deleting each
+ object, and all of this must happen in no more than ten minutes
+ (600 seconds). Muck and the clients should run on different
+ virtual machines.
+
+**Secure**
+
+: Muck must allow access only by an authenticated client
+ representing a data subject, and must only allow that client to
+ access objects owned by the data subject, unless the client has
+ super privileges. The data subject specifies, via the access
+ token, what operations the client is allowed to do: whether they
+ read, update, or delete objects.
+
+
+HTTP API
+=============================================================================
+
+The Muck HTTP API has one endpoint – `/res` – that's used
+for all objects. The objects are called resources by Muck.
+
+The JSON objects Muck operates on must be valid, but their structure
+does not matter to Muck.
+
+Metadata
+-----------------------------------------------------------------------------
+
+Each JSON object stored in Muck is associated with metadata, which is
+represented as the following HTTP headers:
+
+* **Muck-Id** – the resource id
+* **Muck-Revision** – the resource revision
+
+The id is assiged by Muck at object creation time. The revision is
+assigned by Muck when the object is created or modified.
+
+
+API requests
+-----------------------------------------------------------------------------
+
+The RESTful API requests are POST, PUT, GET, and DELETE.
+
+* **POST /res** – create a new object
+* **PUT /res** – update an existing object
+* **GET /res** – retrieve a existing object
+* **DELETE /res** – delete an existing object
+
+Although it is usual for RESTful HTTP APIs to encode resource
+identifiers in the URL, Muck uses headers (Muck-Id, Muck-Revision) for
+consistency, and to provide for later expansion. Muck is not intended
+to be used manually, but by programmatic clients.
+
+Additionally, the "sub" claim in the token is used to assign and check
+ownership of the object. If the scope contains "super", the sub claim
+is ignored, except for creation.
+
+The examples in this chapter use HTTP/1.1, but should provide the
+necessary information for other versions of HTTP. Also, only the
+headers relevant to Muck are shown. For example, HTTP/1.1 requires
+also a Host header, but this is not shown in the examples.
+
+
+
+### Creating an object: POST /res
+
+Creating requires:
+
+* "create" in the scope claim
+* a non-empty "sub" claim, which will be stored by Muck as the owner
+ of the created object
+
+The creation request looks like this:
+
+~~~{.numberLines}
+POST /res HTTP/1.1
+Content-Type: application/
+Authorization: Bearer TOKEN
+
+{"foo": "bar"}
+~~~
+
+Note that the creation request does not include Muck-Id or
+Muck-Revision headers.
+
+A successful response looks like this:
+
+~~~{.numberLines}
+201 Created
+Content-Type: application/json
+Muck-Id: ID
+Muck-Revision: REV1
+~~~
+
+Note that the response does not contain a copy of the resource.
+
+
+
+### Updating an object: PUT /res
+
+Updating requires:
+
+* "update" in the scope claim
+* one of the following:
+ - "super" in the scope claim
+ - "sub" claim matches owner of object Muck; super user can update
+ any resource, but otherwise data subjects can only update their own
+ objects
+* Muck-Revision matches the current revision in Muck; this functions
+ as a simplistic guard against conflicting updates from different
+ clients.
+
+The update request looks like this:
+
+~~~{.numberLines}
+PUT /res HTTP/1.1
+Authorization: Bearer TOKEN
+Content-Type: application/json
+Muck-Id: ID
+Muck-Revision: REV1
+
+{"foo": "yo"}
+~~~
+
+In the request, ID identifies the object, and REV1 is its revision.
+
+The successful response:
+
+~~~{.numberLines}
+200 OK
+Content-Type: application/json
+Muck-Id: ID
+Muck-Revision: REV2
+~~~
+
+Note that the update response also doesn't contain the object. The
+client should remember the new revision, or retrieve the object get
+the latest revision before the next update.
+
+
+### Retrieving an object: GET /res
+
+A request requires:
+
+* "show" in the scope claim
+* one of the following:
+ - "super" in the scope claim
+ - "sub" claim matches owner of object Muck; super user can retrieve
+ any resource, but otherwise data subjects can only update their own
+ objects
+
+The request to retrieve a response:
+
+~~~{.numberLines}
+GET /res HTTP/1.1
+Authorization: Bearer TOKEN
+Muck-Id: ID
+~~~
+
+A successful response:
+
+~~~{.numberLines}
+200 OK
+Content-Type: application/json
+Muck-Id: ID
+Muck-Revision: REV2
+
+{"foo": "yo"}
+~~~
+
+Note that the response does NOT indicate the owner of the resource.
+
+
+
+Acceptance criteria for Muck
+=============================================================================
+
+This chapter details the acceptance criteria for Muck, and how they're
+verified.
+
+
+Basic object handling
+-----------------------------------------------------------------------------
+
+First, we need a new Muck server. It will initially have no objects.
+We also need a test user, whom we'll call Tomjon.
+
+~~~scenario
+given a fresh Muck server
+given I am Tomjon
+~~~
+
+Tomjon can create an object.
+
+~~~scenario
+when I do POST /res with {"foo": "bar"}
+then response code is 201
+then header Muck-Id is ID
+then header Muck-Revision is REV1
+~~~
+
+Tomjon can then retrieve the object. It has the same revision and
+body.
+
+~~~scenario
+when I do GET /res with Muck-Id: {ID}
+then response code is 200
+then header Muck-Revision matches {REV1}
+then body matches {"foo": "bar"}
+~~~
+
+Tomjon can update the object, and the update has the same id, but a
+new revision and body.
+
+~~~scenario
+when I do PUT /res with Muck-Id: {ID}, Muck-Revision: {REV1}, and body {"foo":"yo"}
+then response code is 200
+then header Muck-Revision is {REV2}
+then revisions {REV1} and {REV2} are different
+~~~
+
+If Tomjon tries to update with the old revision, it fails.
+
+~~~scenario
+when I do PUT /res with Muck-Id: {ID}, Muck-Revision: {REV1}, and body {"foo":"yo"}
+then response code is 409
+~~~
+
+After the failed update, the object or its revision haven't changed.
+
+~~~scenario
+when I do GET /res with Muck-Id: {ID}
+then response code is 200
+then header Muck-Revision matches {REV2}
+then body matches {"foo": "yo"}
+~~~
+
+We can delete the resource, and then it's gone.
+
+~~~scenario
+when I do DELETE /res with Muck-Id: {ID}
+then response code is 200
+when I do GET /res with Muck-Id: {ID}
+then response code is 404
+~~~
+
+
+Restarting Muck
+-----------------------------------------------------------------------------
+
+Muck should store data persistently. For this we need our test user to
+have the "super" capability.
+
+~~~scenario
+given a fresh Muck server
+given I am Tomjon, with super capability
+when I do POST /res with {"foo": "bar"}
+then header Muck-Id is ID
+then header Muck-Revision is REV1
+~~~
+
+So far, so good. Nothing new here. Now we restart Muck. The resource
+just created must still be there.
+
+~~~scenario
+when I restart Muck
+when I do GET /res with Muck-Id: {ID}
+then response code is 200
+then header Muck-Revision matches {REV1}
+then body matches {"foo": "bar"}
+~~~
+
+
+Super user access
+-----------------------------------------------------------------------------
+
+Check here that if we have super scope, we can retrieve, update, and
+delete someone else's resources, but if we create a resourec, it's
+ours.
+
+Invalid requests
+-----------------------------------------------------------------------------
+
+There are a number of ways in which a request might be rejected. This
+section verifies all of them.
+
+### Accessing someone else's data
+
+~~~scenario
+given a fresh Muck server
+given I am Tomjon
+when I do POST /res with {"foo": "bar"}
+then header Muck-Id is ID
+then header Muck-Revision is REV1
+when I do GET /res with Muck-Id: {ID}
+then response code is 200
+then header Muck-Revision matches {REV1}
+then body matches {"foo": "bar"}
+~~~
+
+After this, we morph into another test user.
+
+~~~scenario
+given I am Verence
+when I do GET /res with Muck-Id: {ID}
+then response code is 404
+~~~
+
+Note that we get a "not found" error and not a "access denied" error
+so that Verence doesn't know if the resource exists or not.
+
+
+### Updating someone else's data
+
+This is similar to retrieving it, but we try to update instead.
+
+~~~scenario
+given a fresh Muck server
+given I am Tomjon
+when I do POST /res with {"foo": "bar"}
+then header Muck-Id is ID
+then header Muck-Revision is REV1
+given I am Verence
+when I do PUT /res with Muck-Id: {ID}, Muck-Revision: {REV1}, and body {"foo":"yo"}
+then response code is 404
+~~~
+
+
+### Deleting someone else's data
+
+This is similar to retrieving it, but we try to delete it instead.
+
+~~~scenario
+given a fresh Muck server
+given I am Tomjon
+when I do POST /res with {"foo": "bar"}
+then header Muck-Id is ID
+then header Muck-Revision is REV1
+given I am Verence
+when I do DELETE /res with Muck-Id: {ID}
+then response code is 404
+~~~
+
+### Bad signature
+
+### Not valid yet
+
+### Not valid anymore
+
+### Not for our instance
+
+### Lack scope for creation
+
+### Lack scope for retrieval
+
+### Lack scope for updating
+
+### Lack scope for deletion
+
+### No subject when creating
+
+### No subject when retrieving
+
+### No subject when updating
+
+### No subject when deleting
+
+### Invalid JSON when creating
+
+### Invalid JSON when updating
+
+
+# Possible future changes
+
+* There is no way to list all the resources a user has, or search for
+ resource. This should be doable in some way. With a search, a
+ listing operation is not strictly necessary.
+
+* It's going to be inconvenient to only be able to access one's own
+ resources. It would be good to support groups. A resource could be
+ owned by a group, and end-users / subjects could belong to any
+ number of groups. Also, groups should be able to belong to groups.
+ Each resource should be able to specify for each group what access
+ members of that group should have (retrieve, update, delete). There
+ should be no limits to how many group access control rules there are
+ per resource.
+
+ This would allow setups such as each resource representing a stored
+ file, and some groups would be granted read access, or read-write
+ access, or read-delete access to the files.
+
+* Also, it might be good to be able to grant other groups access to
+ controll a resource's access control rules.
+
+* It might be good support schemas for resources?
+
+* It might be good to have a configurable maximum size of a resource.
+ Possibly per-user quotas.
+
+* It would be good to support replication, sharding, and fault
+ tolerance.
+
+* Monitoring, logging, other ops requirements?
+
+* Encryption of resources, so that Muck doesn't see the contents?
+
+* Should Muck sign the resources it returns, with it's own key?