From 0cfe3e388772f07a88e9fdd1bb61967502815368 Mon Sep 17 00:00:00 2001
From: Lars Wirzenius <liw@liw.fi>
Date: Sat, 26 Jan 2019 09:17:22 +0200
Subject: Add: Yuck arch doc page

---
 yuck.mdwn | 488 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 488 insertions(+)
 create mode 100644 yuck.mdwn

(limited to 'yuck.mdwn')

diff --git a/yuck.mdwn b/yuck.mdwn
new file mode 100644
index 0000000..7ac0040
--- /dev/null
+++ b/yuck.mdwn
@@ -0,0 +1,488 @@
+[[!meta title="Yuck - an authentication server"]]
+
+[[!toc levels=2]]
+
+**NOTE**: Yuck is in its planning phase at the moment. No code exists,
+only this document. Feedback on this document is welcome, preferably
+via email to liw@liw.fi. Ick will continue to use Qvisqve for the
+time being, until Yuck is ready to replace it.
+
+# Introduction
+
+Yuck is an **identity provider** that allows end users to **securely
+authenticate** themselves to web sites and applications. Yuck also
+allows users to **authorize** applications to act on their behalf.
+Yuck supports the **OAuth2** and **OpenID Connect** protocols, and has
+an API to allow storing and managing data about end users,
+applications, and other entities related to authentication.
+
+Yuck does not provide any services unrelated to authentication. Other
+services can work with Yuck to control access to them.
+
+OpenID Connect (OIDC) is a protocol suitable for interactively
+authenticating a person (the end user). OAuth2 is suitable for
+non-interactive API clients, possibly ones acting on behalf of the end
+user.
+
+Both OAuth2 and OpenID Connect provide a number of variants and
+extensions. Yuck implements the "client credentials grant" for OAuth2,
+and the "authorization code flow" for OIDC.
+
+Yuck has an extensible architecture for supporting different ways for
+users to authenticate, and for optionally using multiple
+authentication factors. Initially it will implement traditional
+passwords and time-based one-time passwords (TOTP, same as "Google
+Authenticator").
+
+The Yuck architecture supports different ways for storing the data and
+credentials it needs. Initially it comes with support for using the
+Muck JSON store, but support for, say, LDAP can be added.
+
+## Terminology and concepts
+
+* **access token**: a token which grants access to a service or
+  resource; usually short-lived, but see refresh token
+
+* **API client**: a program that uses the API, either on behalf of an
+  end-user, or on its own behalf
+
+* **application**: software that provides a service using the RP
+
+* **authenticate**: prove the identity of someone or something; "this
+  is how you know I am who I say am"; authentication can happen in any
+  number of ways, and different relying parties may have different
+  requirements: government ID; being able to read email sent to an
+  email address; knowing a secret; possessing a unique thing; acting
+  in a particular way; having particular body features (fingerprint,
+  face, voice, hand shape, ...); etc, the list is almost endless
+
+* **authorize**: grant access to an authenticated entity; "what are
+  they allowed to do?"
+
+* **end-user**: a human using the system, typically the reason the
+  system exists, can also be a subject
+
+* **front end**: provides the user interface to an end user via the
+  user agent or browser; typically provides HTML, JS, CSS, and images,
+  statically or generated dynamically, but could audio, video, or
+  anything the user can interact with
+
+* **IDP**: short for identity provider
+
+* **identify**: claim an identity; "this is who I say I am"
+
+* **identity**: who a human is, or which instance of a program is
+
+* **identity provider**: software the authenticates an end user and
+  non-human entities, and also stores authorizations for them
+
+* **JWT**: a standard way to represent tokens, see [JWT][]; Yuck will
+  use digitally signed tokens
+
+* **OAuth2**: a protocol for authenticating software; see [OAuth2][]
+
+* **OIDC**: short for OpenID Connect; a protocol for authenticating
+  end users; see [OIDC][]
+
+* **refresh token**: a token that can be used to get a new access
+  token; usually long-lived, but can be revoked
+
+* **relying party**: software that relies on the IDP for
+  authentication and authorization; often a resource provider, but can
+  also do things on request instead of merely storing things
+
+* **resource**: data stored by a resource provider
+
+* **resource provider**: stores resources and allows authorized access
+  to it; "database"
+
+* **RP** is short for relying party or resource provider
+
+* **subject**: a person whose personal information is handled by the
+  system, see end-user
+
+* **user agent**: typically a web browser, but can be a mobile
+  or desktop application; assumed to be under complete user control,
+  and so trusted by the user, but not the ecosystem
+
+[JWT]: https://en.wikipedia.org/wiki/JSON_Web_Token
+[OAuth2]: https://en.wikipedia.org/wiki/OAuth#OAuth_2.0
+[OIDC]: https://en.wikipedia.org/wiki/OpenID_Connect
+
+# Requirements
+
+[RFC 2119]: https://www.ietf.org/rfc/rfc2119.txt
+
+Yuck has at least the following high level requirements. 
+
+In this section, the key words "MUST", "MUST NOT", "REQUIRED",
+"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
+and "OPTIONAL" in this document are to be interpreted as described in
+[RFC 2119][].
+
+Each requirement and sub-requirement is given a unique name for easier
+reference in discussions.
+
+* (SECURE) Yuck MUST be secure.
+    * (CREDSTORE) Yuck MUST be store credentials in a way that
+      minimises damage if they leak. Credentials SHOULD be stored
+      encrypted using a respected encryption algorithm (such as
+      scrypt) and using per-credential salting. Or something stronger
+      may be implemented instead.
+    * (MFA) Yuck MUST support multi-factor authentication using secure
+      factors.
+    * (PROTOS) Yuck MUST use secure protocols to authenticate users
+      and API clients.
+    * (HTTPS) Yuck MUST NOT ever use plain HTTP, only HTTPS.
+    * (AUDIT) Yuck SHOULD undergo security audits, and general
+      scrutiny. Audits SHOULD happen regularly. (This is not an
+      absolute requirement, as it depends on the availability of
+      competent auditors. Yuck is not a for-profit project, and may
+      not be able to pay them.)
+    * (SECUREANDUSABLE) The Yuck developers MUST keep security at the
+      highest priority, without sacrificing usability.
+* (QUALITY) The Yuck project MUST aim for high quality, by applying
+  development methods that are known to work for achieving quality,
+  such as test-driven development, automated test suites with high
+  test coverage, and code review.
+* (HSCALABLE) The Yuck architecture MUST be horizontally scalable to
+  very large numbers of concurrent users and API clients.
+    * (NOTUNSCALABLE) The implementation might not scale to very many
+      users or concurrent users, especially initially, but the
+      architecure MUST NOT prevent a scalable implementation.
+* (ADMINFRIENDLY) Yuck MUST be flexible for system administrators to
+  manage, and applications to use.
+    * (ADMINAPIS) Yuck SHOULD provide APIs for managing the entities
+      and data it needs, such as for creating end users and API
+      clients, or changing their credentials.
+    * (APPFRIENDLY) Yuck SHOULD enable applications to delegate all
+      authentication to Yuck.
+* (FREEDOM) Yuck MUST be free software. It MUST NOT require
+  applications, API clients, and other software that works with Yuck
+  to be free software.
+* (PRIVACYSTORE) Yuck MUST NOT store personal information it does not
+  need.
+* (PRIVACYLEAK) Yuck MUST NOT leak personal information.
+
+
+# Architecture: the ecosystem
+
+[[!graph type=digraph src="""
+user [shape="ellipse" label="end user" margin="0.2,0.2"];
+browser [shape="tab" label="web browser /\nuser agent" margin="0.2,0.2"];
+webapp [shape="component" label="application\nfrontend" margin="0.2,0.2"];
+IDP [shape="component" label="IDP" margin="0.2,0.2"];
+RP [shape="cylinder" margin="0.2,0.3"];
+app_api [shape="component" label="application\nbackend" margin="0.2,0.2"];
+user_client [shape="tab" label="API client\n(on behalf of user)" margin="0.2,0.2"];
+auto_client [shape="tab" label="API client\n(autonomous)" margin="0.2,0.2"];
+
+user -> browser;
+browser -> webapp;
+browser -> IDP;
+webapp -> IDP;
+webapp -> app_api;
+app_api -> RP;
+
+user_client -> IDP;
+user_client -> app_api;
+
+auto_client -> IDP;
+auto_client -> app_api;
+"""]]
+
+An IDP interacts with several other systems to enable end users to do their
+thing. The RP provides the actual service, and delegates
+authentication to the IDP. There can be other services in front of the
+RP, and for security reasons there has to be at least one for end-user
+authentication.
+
+* The end user interacts directly with their web browser or other user
+  agent, which is assumed to be entirely under their control, and thus
+  not trusted by the IDP or other components. The end user is assumed
+  to trust what they use.
+
+* The browser talks to the facade (to get the HTML and JS and other
+  files to present a UI to the user), and the IDP (to allow the user
+  to authenticate themselves).
+
+* The facade holds the access token on behalf of an authenticated end
+  user.
+
+* The facade talks to a backend, giving it the user's access token as
+  proof of authentication and authorization.
+
+* The backend provides an API suitable for the service it provides. It
+  also allows access based on the access token.
+
+* The resource provider stores data for the backend. It also allows
+  access based on the access token.
+
+Some access is not interactive by the end user, but by API clients
+that either act on behalf of the user, or are unrelated to them in any
+way. The end user can authorize an API client access on their behalf.
+The authorization can limit the API client's access to a subset to
+what the end user user can do. If the end user can both read and write
+a resource, the authorization might only allow the API client to read
+the resource.
+
+API clients that are unrelated to the user are authorized by the
+owners of the RP. See below for an example.
+
+## Authentication scenarios
+
+As examples of how an authentication server might be used, consider a
+an online banking system. It should support at least three scenarios.
+
+**End user interactively accesses their account**: The end user opens up
+the bank web page, and logs in, and can interactively do whatever
+they're allowed to do: view their bank statement, transfer money, etc.
+
+**End user authorizes an API client**: The end user, who happens to be
+a Unix sysadmin, might want to automatically retrieve their bank
+statement and feed it to their accounting system. They create an
+authorization for an API client that only allows it to retrieve the
+statement, but not do anything else. This creates, in the IDP, a new
+API client identity, which is tied to the end user's identity, so that
+whatever the API client does, it is known to act on behalf of the end
+user.
+
+**Bank pays interest automatically**: The bank runs an API client,
+authorized by the bank to act autonomously and without end user
+authorization, which annually transfers interest from the bank's own
+account to each end user's account.
+
+Obviously, a real bank would need a lot more scenarios, but these will
+do for discussing Yuck.
+
+## Data model
+
+Yuck needs to store data about end users, applications, and API
+clients. It models the data as a set of "resources", which can be
+represented as JSON objects. Initially, Yuck will store the JSON
+objects in Muck, which is a dedicated JSON object store, but Yuck will
+be able to support any store that supports the following:
+
+* an object can be created and assigned a unique ID and revision
+* an object can be updated, with collision prevention using the
+  revision (updater gives the revision of what they think is the
+  newest revision; the store will fail the update if it isn't)
+* an object can be retrieved, given the ID
+* an object can be deleted, given the ID
+* objects can be search for, based on any field defined below, using
+  case-independent equality or comparison to a pattern
+
+### A user
+
+A user resource represents the user. It's object ID is used to
+identify users in the eco system, not a username. The object identity
+is unique, never changes, and is chosen by Yuck, and ideally is never
+shown to the user, and only used to reference the user internally.
+
+The user resource stores the following data:
+
+* `allowed_scopes` &mdash; (a list of strings) the scopes the user is
+  allowed to have
+
+Note that the user object does not store usernames or credentials in
+any way. They may have any number of credentials, for multi-factor
+authentication. When a user is being authenticated, they must provide
+all credentials.
+
+### A username
+
+A username resource stores one name by which the user is identified to
+the system. As far as Yuck is concerned, a user may have any number of
+usernames, and they can change. The username is user-visible, and
+chosen by the user. They need to be unique.
+
+* `user_ref` &mdash; (a string) ID of the user resource for the user
+* `username` &mdash; (a string) a username for the user
+
+Yuck stores as little about a user as possible. For example, it does
+not store the full name, or any contact information. The applications
+may store that separately.
+
+### An OAuth2 API client
+
+For OAuth2 API clients, the following data is stored:
+
+* `user_ref` &mdash; (a string, or `null`) ID of the user resource for
+  the user on behalf of whom the API client acts, if any
+* `allowed_scopes` &mdash; (a list of strings) the scopes the API
+  client is allowed to have
+
+Note that an API client may act on behalf of a user, but does not need
+to do so. If `user_ref` is set to a non-empty string, it is acting on
+behalf of a user, and this will cause any access tokens the API client
+gets to have the `sub` claim set to the user's ID.
+
+### An OIDC application front end
+
+For OIDC application front ends, the following data is stored:
+
+* `allowed_scopes` &mdash; (a list of strings) the scopes the API
+  client it allowed to have
+* `callbacks` &mdash; (a list of strings) the callback URIs for the
+  application
+
+### A password credential for scrypt
+
+For password based authentication for users, API clients, and
+application front ends, Yuck will store the following data:
+
+* `user_ref` &mdash; (a string, or `null`) ID of the user resource for
+  the user, if any
+* `client_ref` &mdash; (a string, or `null`) ID of the resource for
+  the API client, if any
+* `hash` &mdash; (a string) password encrypted using scrypt, encoded
+  as hexadecimal
+* `salt` &mdash; (a string) randomly chosen string to salt the
+  encryption, encoded as hexadecimal
+* `key_len` &mdash; (an integer) used for scrypt
+* `N` &mdash; (an integer) used for scrypt
+* `r` &mdash; (an integer) used for scrypt
+* `p` &mdash; (an integer) used for scrypt
+
+Note that Yuck will require only one of `user_ref` and `client_ref` to
+be set to a non-empty string, and the other one to `null`.
+
+The `key_len`, `N`, `r`, and `p` fields are used for scrypt
+encryption. They are stored so that they can later be varied without
+making previously stored passwords invalid.
+
+### A TOTP credential for a user
+
+Yuck stores the TOTP credential for a user as follows:
+
+* `user_ref` &mdash; (a string) ID of the user resource for the user
+* the rest to be determined, when TOTP is implemented
+
+## External interfaces of Yuck
+
+Yuck provides the following interfaces to the rest of the ecosystem:
+
+* endpoints for managing users, API clients, OIDC application
+  frontends, including their credentials
+* an endpoint for OAuth2 API clients to get tokens using client
+  credential grants
+* endpoints for OIDC frontends to use for interactively authenticating
+  the end users, and for getting the resulting tokens (including
+  refreshed tokens)
+* an endpoint for monitoring the health of Yuck
+
+Details will be specified later.
+
+# Authentication protocols
+
+This chapter will walk through of each of the protocols Yuck supports,
+down to sample HTTP requests and responses.
+
+## Authorization information
+
+Overview of how authorization happens in the eco system:
+
+* The IDP keeps track of what each end user and API client is
+  authorized to do. This is encoded by storing a list of "scopes". A
+  scope is a permission to do something, such as "create a resource"
+  or "update a resource the end user owns". See `allowed_scopes` in
+  the user and API client resources.
+
+* The access token identifies the end user. The token grants
+  permission to its bearer to do specific actions, encoded as a list
+  of scopes. Note that an access token need not have all the allowed
+  scopes.
+
+* The API provider actually implements the access control checks based
+  on the access token and its contents. The API provider implements
+  specific actions, and associates each with a scope, and checks that
+  the token has that scope.
+
+For example, assume that Alice is authorized the actions "create
+resource" and "read resource owned by the user"; `authorized_scopes`
+has the scopes `create` and `read`. 
+
+Alice creates an API client, but only allows it the `read` scope. When
+the API client gets an access token, it will have the `sub` claim set
+to `alice`, and the `scope` claim set to `read`. With such an access
+token, the API client can read any resources that Alice can read, but
+can't create new resources.
+
+## OAuth2 for autonomous API clients
+
+* walkthrough of an API client getting tokens via OAuth2 CC
+* and using them
+
+## OIDC for interactive end users
+
+* walkthrough of an end-user causing facade to get tokens
+* and facade using them
+* web sessions
+
+## End users authorizing API clients
+
+* walkthrough
+
+# Architecure: Yuck itself
+
+[[!graph src="""
+user [shape="ellipse" label="end user" margin="0.2,0.2"];
+browser [shape="tab" label="web browser /\nuser agent" margin="0.2,0.2"];
+webapp [shape="component" label="application\nfrontend" margin="0.2,0.2"];
+user_client [shape="tab" label="API client\n(on behalf of user)" margin="0.2,0.2"];
+auto_client [shape="tab" label="API client\n(autonomous)" margin="0.2,0.2"];
+IDP_auth [shape="component" label="IDP auth endpoints" margin="0.2,0.2"];
+IDP_token [shape="component" label="IDP token endpoints" margin="0.2,0.2"];
+IDP_admin [shape="component" label="IDP admin endpoints" margin="0.2,0.2"];
+IDP_store [shape="cylinder" label="IDP data store" margin="0.2,0.3"];
+
+user -> browser;
+browser -> webapp;
+webapp -> IDP_auth;
+webapp -> IDP_token;
+webapp -> IDP_admin;
+
+user_client -> IDP_token;
+user_client -> IDP_admin;
+
+auto_client -> IDP_token;
+
+IDP_auth -> IDP_store;
+IDP_token -> IDP_store;
+IDP_admin -> IDP_store;
+
+"""]]
+
+The diagram above doesn't include parts of the eco system that are not
+part of Yuck or don't directly interact with Yuck.
+
+Yuck consists of three sets of endpoints, and a data store. The
+endpoints implement the external interfaces for the authentication
+protocols, and for administration. The data store stores JSON objects.
+
+An API client acting on behalf of an administrator, will use the Yuck
+admin endpoints to manage uses, API clients, and OIDC applications. An
+application frontend may provide a user interface for doing the same.
+
+Note that the various Yuck endpoints and the processes implementing
+them do not need to interact except via the data store. This enables
+horizontal scalability to the extent the data store scales.
+
+(It may be more sensible to have the application backend provide an
+interface for admin actions. It will still need to use the Yuck admin
+endpoints for doing that. This possibility has been left out of the
+diagram to avoid clutter.)
+
+## The data store
+
+[Muck]: http://git.liw.fi/muck-poc/tree/
+
+The data store will initially be [Muck][], which as a RESTful HTTP API
+for managing JSON objects. The API uses the kind of JWT access tokens
+for access control that Yuck creates. Yuck can create the tokens for
+its own use.
+
+Later, support for other data stores can be added. LDAP is probably
+going to be desired. This can be done by implementing a new component
+that provides a Muck-like interface, but stores the data in LDAP.
+Similarly, support can be added for SQL databases, etc.
-- 
cgit v1.2.1