[[!meta title="Yuck - an authentication server"]]

**NOTE:** Yuck is in its planning phase at the moment. No code exists,
only this document. Feedback on this document is welcome, via normal
Ick channels. Ick will continue to use Qvisqve for the
time being, until Yuck is ready to replace it.

[[!toc levels=2]]

# Introduction

Yuck is an **identity provider** that allows end users to **securely
authenticate** themselves to web sites and applications. Yuck also
allows users to **authorize** applications to act on their behalf.
Yuck supports the **OAuth2** and **OpenID Connect** protocols, and has
an API to allow storing and managing data about end users,
applications, and other entities related to authentication.

Yuck is intended to be used by web applications. It is not meant for
authentication Unix or ssh logins or such. Status quo is that web
applications often implement authentication themselves, but it is the
opinion of Yuck's authors that this is a bad architectural design.
Having a dedicated identity provider keeps the security sensitive
parts of authentication in one place, without mixing them with
application logic, results in a more cohesive, less coupled
architecture and implementation that is more easily reviewed and
modified. A separate identity provider also makes it easier to provide
single sign-on for groups of applications, without complicating each
application.

Yuck does not provide any services unrelated to authentication. Other
services can work with Yuck to control access to them.

OpenID Connect (OIDC) is a protocol suitable for interactively
authenticating a person (the end user). OAuth2 is suitable for
non-interactive API clients, possibly ones acting on behalf of the end
user.

Both OAuth2 and OpenID Connect provide a number of variants and
extensions. Yuck implements the **client credentials grant** for OAuth2,
and the **authorization code flow** for OIDC.

Yuck has an extensible architecture for supporting different ways for
users to authenticate, and for optionally using multiple
authentication factors. Initially it will implement traditional
passwords and time-based one-time passwords (TOTP, same as "Google
Authenticator").

The Yuck architecture supports different ways for storing the data and
credentials it needs. Initially it comes with support for using the
Muck JSON store, but support for, say, LDAP can be added.

## Terminology and concepts

* **access token:** a token which grants access to a service or
  resource; usually quite short-lived (maybe less than a minute),
  since it can't be easily revoked, but see refresh token

* **API client:** a program that uses the API, either on behalf of an
  end-user, or on its own behalf

* **application:** software that provides a service using the RP

* **authenticate:** prove the identity of someone or something; "this
  is how you know I am who I say am"; authentication can happen in any
  number of ways, and different relying parties may have different
  requirements: government ID; being able to read email sent to an
  email address; knowing a secret; possessing a unique thing; acting
  in a particular way; having particular body features (fingerprint,
  face, voice, hand shape, ...); etc, the list is almost endless

* **authorize:** grant access to an authenticated entity; "what are
  they allowed to do?"

* **end-user:** a human using the system, typically the reason the
  system exists, can also be a subject

* **front end:** provides the user interface to an end user via the
  user agent or browser; typically provides HTML, JS, CSS, and images,
  statically or generated dynamically, but could audio, video, or
  anything the user can interact with

* **IDP:** short for identity provider

* **identify:** claim an identity; "this is who I say I am"

* **identity:** who a human is, or which instance of a program is

* **identity provider:** software the authenticates an end user and
  non-human entities, and also stores authorizations for them

* **JWT:** a standard way to represent tokens, see [JWT][]; Yuck will
  use digitally signed tokens

* **OAuth2:** a protocol for authenticating software; see [OAuth2][]

* **OIDC:** short for OpenID Connect; a protocol for authenticating
  end users; see [OIDC][]

* **refresh token:** a token that can be used to get a new access
  token; usually long-lived, but can be revoked, since every use can
  be checked by the IDP

* **relying party:** software that relies on the IDP for
  authentication and authorization; often a resource provider, but can
  also do things on request instead of merely storing things

* **resource:** data stored by a resource provider

* **resource provider:** stores resources and allows authorized access
  to it; "database"

* **RP** is short for relying party or resource provider

* **subject:** a person whose personal information is handled by the
  system, see end-user

* **user agent:** typically a web browser, but can be a mobile
  or desktop application; assumed to be under complete user control,
  and so trusted by the user, but not the ecosystem

[JWT]: https://en.wikipedia.org/wiki/JSON_Web_Token
[OAuth2]: https://en.wikipedia.org/wiki/OAuth#OAuth_2.0
[OIDC]: https://en.wikipedia.org/wiki/OpenID_Connect

# Requirements

[RFC 2119]: https://www.ietf.org/rfc/rfc2119.txt

Yuck has at least the following high level requirements. 

In this section, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" in this document are to be interpreted as described in
[RFC 2119][].

Each requirement and sub-requirement is given a unique name for easier
reference in discussions.

* (SECURE) Yuck MUST be secure.

    * (CREDSTORE) Yuck MUST store credentials in a way that
      minimises damage if they leak. Credentials SHOULD be stored
      encrypted using a respected encryption algorithm (such as
      scrypt) and using per-credential salting. Or something stronger
      may be implemented instead. Additionally, all the credntial
      records SHOULD be encrypted for an additional layer of defense.

    * (MFA) Yuck MUST support multi-factor authentication using secure
      factors.

    * (PROTOS) Yuck MUST use secure protocols to authenticate users
      and API clients.

    * (HTTPS) Yuck MUST NOT ever use plain HTTP, only HTTPS.

    * (AUDIT) Yuck SHOULD undergo security audits, and general
      scrutiny. Audits SHOULD happen regularly. (This is not an
      absolute requirement, as it depends on the availability of
      competent auditors. Yuck is not a for-profit project, and may
      not be able to pay them.)

    * (SECUREANDUSABLE) The Yuck developers MUST keep security at the
      highest priority, without sacrificing usability.

* (QUALITY) The Yuck project MUST aim for high quality, by applying
  development methods that are known to work for achieving quality,
  such as test-driven development, automated test suites with high
  test coverage, and code review.

* (HSCALABLE) The Yuck architecture MUST be horizontally scalable to
  very large numbers of concurrent users and API clients.

    * (NOTUNSCALABLE) The implementation might not scale to very many
      users or concurrent users, especially initially, but the
      architecure MUST NOT prevent a scalable implementation.

* (ADMINFRIENDLY) Yuck MUST be flexible for system administrators to
  manage, and applications to use.

    * (ADMINAPIS) Yuck SHOULD provide APIs for managing the entities
      and data it needs, such as for creating end users and API
      clients, or changing their credentials.

    * (ACCOUNTAPI) Yuck MUST provide an API for managing end-user
      accounts: creating, deleting, changing, resetting secrets, etc.

    * (APPFRIENDLY) Yuck SHOULD enable applications to delegate all
      authentication to Yuck.

    * (CREDNOTIFY) Yuck SHOULD provide an out-of-band notification for
      users and admins when credentials are changed.

* (FREEDOM) Yuck MUST be free software. It MUST NOT require
  applications, API clients, and other software that works with Yuck
  to be free software.
* (PRIVACYSTORE) Yuck MUST NOT store personal information it does not
  need.

* (PRIVACYLEAK) Yuck MUST NOT leak personal information.

* (PWRESET) Yuck MUST support the user resetting their password,
  securely. Possibly by supporting a random, single-use link that can
  be communicated to the user (perhaps via email) to allow them to
  change the password.

* (TEMPLOCK) Yuck MUST support locking an account temporarily, if it
  is the target of too many failures. This is to avoid an attacker
  from brute-forcing a password by trying many times.

* (TEMPLOCKNOTIFY) Yuck MUST notify an account owner of temporary
  locking, out of band.

* (ACLSIMPLE) It must be easy to understand and reason about ACL
  rules. It may be good aid this by visualising.

* (ACLTRY) There must be a way to test ACL rules: if *this* user in
  *these groups* does *this* operation for *this* resource, is it
  allowed? This may require additional support from the RP.

* (DISABLEACCT) It must be possible to disable an account (whether for
  an end-user or an API client) so that it still exists, but
  authentication cannot ever succeed.

* (KILLSESSION) It must be possible to kill existing individual web
  sessions to kick out someone who is logged in to Yuck.

* (KEYROTATION) The IDP MUST rotate signing keys so that a leaked key
  can be easily replaced. The IDP MUST have a secure way to distribute
  the key to clients.

* (AUTHLIMIT) Authentication MUST be possible to limit by time or IP
  address.

* (VISUALIZE) There should be a way to visualize access control rules
  so that it's easier to debug them.

* (TESTRULES) There should be a way to test access control rules for
  specific entities and resources.


# Architecture: the ecosystem

[[!graph type=digraph src="""
user [shape="ellipse" label="end user" margin="0.2,0.2"];
browser [shape="tab" label="web browser /\nuser agent" margin="0.2,0.2"];
webapp [shape="component" label="application\nfrontend\n(facade)" margin="0.2,0.2"];
IDP [shape="component" label="IDP" margin="0.2,0.2"];
RP [shape="cylinder" margin="0.2,0.3"];
app_api [shape="component" label="application\nbackend" margin="0.2,0.2"];
user_client [shape="tab" label="API client\n(on behalf of user)" margin="0.2,0.2"];
auto_client [shape="tab" label="API client\n(autonomous)" margin="0.2,0.2"];

user -> browser;
browser -> webapp;
browser -> IDP;
webapp -> IDP;
webapp -> app_api;
app_api -> RP;

user_client -> IDP;
user_client -> app_api;

auto_client -> IDP;
auto_client -> app_api;
"""]]

An IDP interacts with several other systems to enable end users to do their
thing. The RP provides the actual service, and delegates
authentication to the IDP. There can be other services in front of the
RP, and for security reasons there has to be at least one for end-user
authentication.

* The end user interacts directly with their web browser or other user
  agent, which is assumed to be entirely under their control, and thus
  not trusted by the IDP or other components. The end user is assumed
  to trust what they use.

* The browser talks to the facade (to get the HTML and JS and other
  files to present a UI to the user), and the IDP (to allow the user
  to authenticate themselves).

* The facade holds the access token on behalf of an authenticated end
  user. The access token can't be given to the browser, since the
  browser can't be assumed to be highly secure, from the point of view
  of the relying party.

* The facade talks to a backend, giving it the user's access token as
  proof of authentication and authorization.

* The backend provides an API suitable for the service it provides. It
  also allows access based on the access token.

* The resource provider stores data for the backend. It also allows
  access based on the access token.

Some access is not interactive by the end user, but by API clients
that either act on behalf of the user, or are unrelated to them in any
way. The end user can authorize an API client access on their behalf.
The authorization can limit the API client's access to a subset to
what the end user user can do. If the end user can both read and write
a resource, the authorization might only allow the API client to read
the resource.

API clients that are unrelated to the user are authorized by the
owners of the RP. See below for an example.

## Authentication scenarios

As examples of how an authentication server might be used, consider a
an online banking system. It should support at least three scenarios.

**End user interactively accesses their account:** The end user opens up
the bank web page, and logs in, and can interactively do whatever
they're allowed to do: view their bank statement, transfer money, etc.

**End user authorizes an API client:** The end user, who happens to be
a Unix sysadmin, might want to automatically retrieve their bank
statement and feed it to their accounting system. They create an
authorization for an API client that only allows it to retrieve the
statement, but not do anything else. This creates, in the IDP, a new
API client identity, which is tied to the end user's identity, so that
whatever the API client does, it is known to act on behalf of the end
user.

**Bank pays interest automatically:** The bank runs an API client,
authorized by the bank to act autonomously and without end user
authorization, which annually transfers interest from the bank's own
account to each end user's account.

Obviously, a real bank would need a lot more scenarios, but these will
do for discussing Yuck.

## Data model

Yuck needs to store data about end users, applications, and API
clients. It models the data as a set of "resources", which can be
represented as JSON objects. Initially, Yuck will store the JSON
objects in Muck, which is a dedicated JSON object store, but Yuck will
be able to support any store that supports the following:

* an object can be created and assigned a unique ID and revision
* an object can be updated, with collision prevention using the
  revision (updater gives the revision of what they think is the
  newest revision; the store will fail the update if it isn't)
* an object can be retrieved, given the ID
* an object can be deleted, given the ID
* objects can be search for, based on any field defined below, using
  case-independent equality or comparison to a pattern

The facade will need to store user login session data, such as the
access and refresh tokens for the user. It will store these in some
secure manner that prevents them from leaking to an attacker, such as
in memory only. It may store them (possibly encrypted) in Muck
instead, if this is needed to allow the facade to be restarted without
breaking sessions, or to run multiple copies of the facade.

### A user

A user resource represents the user. It's object ID is used to
identify users in the eco system, not a username. The object identity
is unique, never changes, and is chosen by Yuck, and ideally is never
shown to the user, and only used to reference the user internally.

The user resource stores the following data:

* `allowed_scopes` &mdash; (a list of strings) the scopes the user is
  allowed to have

Note that the user object does not store usernames or credentials in
any way. They may have any number of credentials, for multi-factor
authentication. When a user is being authenticated, they must provide
all credentials.

### A username

A username resource stores one name by which the user is identified to
the system. As far as Yuck is concerned, a user may have any number of
usernames, and they can change. The username is user-visible, and
chosen by the user. They need to be unique.

* `user_ref` &mdash; (a string) ID of the user resource for the user
* `username` &mdash; (a string) a username for the user

Yuck stores as little about a user as possible. For example, it does
not store the full name, or any contact information. The applications
may store that separately.

### An OAuth2 API client

For OAuth2 API clients, the following data is stored:

* `user_ref` &mdash; (a string, or `null`) ID of the user resource for
  the user on behalf of whom the API client acts, if any
* `allowed_scopes` &mdash; (a list of strings) the scopes the API
  client is allowed to have

Note that an API client may act on behalf of a user, but does not need
to do so. If `user_ref` is set to a non-empty string, it is acting on
behalf of a user, and this will cause any access tokens the API client
gets to have the `sub` claim set to the user's ID.

### An OIDC application front end

For OIDC application front ends, the following data is stored:

* `allowed_scopes` &mdash; (a list of strings) the scopes the API
  client it allowed to have
* `callbacks` &mdash; (a list of strings) the callback URIs for the
  application

### A password credential for scrypt

For password based authentication for users, API clients, and
application front ends, Yuck will store the following data:

* `user_ref` &mdash; (a string, or `null`) ID of the user resource for
  the user, if any
* `client_ref` &mdash; (a string, or `null`) ID of the resource for
  the API client, if any
* `hash` &mdash; (a string) password encrypted using scrypt, encoded
  as hexadecimal
* `salt` &mdash; (a string) randomly chosen string to salt the
  encryption, encoded as hexadecimal
* `key_len` &mdash; (an integer) used for scrypt
* `N` &mdash; (an integer) used for scrypt
* `r` &mdash; (an integer) used for scrypt
* `p` &mdash; (an integer) used for scrypt

Note that Yuck will require only one of `user_ref` and `client_ref` to
be set to a non-empty string, and the other one to `null`.

The `key_len`, `N`, `r`, and `p` fields are used for scrypt
encryption. They are stored so that they can later be varied without
making previously stored passwords invalid.

### A TOTP credential for a user

Yuck stores the TOTP credential for a user as follows:

* `user_ref` &mdash; (a string) ID of the user resource for the user
* the rest to be determined, when TOTP is implemented

## External interfaces of Yuck

Yuck provides the following interfaces to the rest of the ecosystem:

* endpoints for managing users, API clients, OIDC application
  frontends, including their credentials
* an endpoint for OAuth2 API clients to get tokens using client
  credential grants
* endpoints for OIDC frontends to use for interactively authenticating
  the end users, and for getting the resulting tokens (including
  refreshed tokens)
* an endpoint for monitoring the health of Yuck

Details will be specified later.

# Authentication protocols

This chapter will walk through of each of the protocols Yuck supports,
down to sample HTTP requests and responses.

## Authorization information

Overview of how authorization happens in the eco system:

* The IDP keeps track of what each end user and API client is
  authorized to do. This is encoded by storing a list of "scopes". A
  scope is a permission to do something, such as "create a resource"
  or "update a resource the end user owns". See `allowed_scopes` in
  the user and API client resources.

* The access token identifies the end user. The token grants
  permission to its bearer to do specific actions, encoded as a list
  of scopes. Note that an access token need not have all the allowed
  scopes.

* The API provider actually implements the access control checks based
  on the access token and its contents. The API provider implements
  specific actions, and associates each with a scope, and checks that
  the token has that scope.

For example, assume that Alice is authorized the actions "create
resource" and "read resource owned by the user"; `authorized_scopes`
has the scopes `create` and `read`. 

Alice creates an API client, but only allows it the `read` scope. When
the API client gets an access token, it will have the `sub` claim set
to `alice`, and the `scope` claim set to `read`. With such an access
token, the API client can read any resources that Alice can read, but
can't create new resources.

## OAuth2 for autonomous API clients

* walkthrough of an API client getting tokens via OAuth2 CC
* and using them

## OIDC for interactive end users

* walkthrough of an end-user causing facade to get tokens
* and facade using them
* web sessions

## End users authorizing API clients

* walkthrough

# Architecure: Yuck itself

[[!graph src="""
user [shape="ellipse" label="end user" margin="0.2,0.2"];
browser [shape="tab" label="web browser /\nuser agent" margin="0.2,0.2"];
webapp [shape="component" label="application\nfrontend" margin="0.2,0.2"];
user_client [shape="tab" label="API client\n(on behalf of user)" margin="0.2,0.2"];
auto_client [shape="tab" label="API client\n(autonomous)" margin="0.2,0.2"];
IDP_auth [shape="component" label="IDP auth endpoints" margin="0.2,0.2"];
IDP_token [shape="component" label="IDP token endpoints" margin="0.2,0.2"];
IDP_admin [shape="component" label="IDP admin endpoints" margin="0.2,0.2"];
IDP_store [shape="cylinder" label="IDP data store" margin="0.2,0.3"];

user -> browser;
browser -> webapp;
webapp -> IDP_auth;
webapp -> IDP_token;
webapp -> IDP_admin;

user_client -> IDP_token;
user_client -> IDP_admin;

auto_client -> IDP_token;

IDP_auth -> IDP_store;
IDP_token -> IDP_store;
IDP_admin -> IDP_store;

"""]]

The diagram above doesn't include parts of the eco system that are not
part of Yuck or don't directly interact with Yuck.

Yuck consists of three sets of endpoints, and a data store. The
endpoints implement the external interfaces for the authentication
protocols, and for administration. The data store stores JSON objects.

An API client acting on behalf of an administrator, will use the Yuck
admin endpoints to manage uses, API clients, and OIDC applications. An
application frontend may provide a user interface for doing the same.

Note that the various Yuck endpoints and the processes implementing
them do not need to interact except via the data store. This enables
horizontal scalability to the extent the data store scales.

(It may be more sensible to have the application backend provide an
interface for admin actions. It will still need to use the Yuck admin
endpoints for doing that. This possibility has been left out of the
diagram to avoid clutter.)

## The data store

[Muck]: http://git.liw.fi/muck-poc/tree/

The data store will initially be [Muck][], which has a RESTful HTTP API
for managing JSON objects. The API uses the kind of JWT access tokens
for access control that Yuck creates. Yuck can create the tokens for
its own use.

Later, support for other data stores can be added. LDAP is probably
going to be desired. This can be done by implementing a new component
that provides a Muck-like interface, but stores the data in LDAP.
Similarly, support can be added for SQL databases, etc.