summaryrefslogtreecommitdiff
path: root/rethinking-email.md
blob: db4e521c05d2426f05f0a1ed1b18a4f5893545c8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
---
title: "Re-thinking electronic mail"
author: Lars Wirzenius
abstract: |
  There are many problems with the existing Internet email system,
  such as spam, scam, surveillance, insecurity, centralization, and 
  complexity. The problems are starting to outweigh
  the benefits of the system. Fixing the problems by evolving the
  current system seems overwhelmingly difficult. This essay examines
  some solutions to the problems on the assumption that a completely
  new, parallel email system can be built.

  This is not a proposal for a new system, but an exploration of the
  solution space, meant to provoke constructive discussion.
bibliography: email.bib
...

# Introduction

I am tired of the existing Internet email system, both as a sender of
email, as a recipient, and as an operator of an email server.

There's rampant spam, scam, and attempts at solving those help making
the email system more centralized ways. This essay is about sketching
what a good email system would look like, if it were re-designed from
scratch, using everything we've learned over the decades, and without
having to use any part of the existing system.

As an anecdote, I am currently not on any active email discussion
lists, or groups, or subscribed to newsletters. I have a few separate
email addresses that I give to online shops as contact information. My
main address has been used for free and open source software
contribution for many years.

I get less than five valid emails a day, usually from friends. Also, a
small number of valid notification emails from automated systems. I
get on the order of 400 spam and scam emails a day. They vary greatly
in how targeted they are. They are all unsolicited and unwelcome.
Unfortunately, despite honing my email filters for decades now,
sometimes a valid email from a new sender ends up being filtered as
spam, and I'm at risk of missing it.

Sometimes those emails are important, such as questions about some of
my contributions, or notifications from a purchase I've made. If I
didn't skim my spam folder manually I would've missed the email about
some of my software being used in Africa to provide local people with
useful SMS services that would've been financially impossible with
proprietary software.

There are good aspects the existing Internet email has that are still
valuable enough that I continue to use it. I am, however, getting closert
o the point that I'd like to make things radically better.

This essay collects my thoughts about email and what a replacement
system might look like. I am not in a position to build the new system
in my free time, but I can at least try to inspire more people to
think about this and maybe the discussion will end up with something
good.


## Good aspects to keep

I find the following aspects of the current email system good and
valuable and would like a new system to retain them.

* Ubiquitous. Approximately everyone on the Internet has email, or can
  get it. Any replacement should be easy for anyone to start using.

* Anyone can email anyone. This lowers barriers for communication,
  globally. It's especially important for free and open source software
  projects, but also to allow people all over the world to easily
  self-organize to build a better world in general.

* Distributed: sender and recipient don't need to use the same server.
  Anyone can set up their own server, assuming time, know-how, and a
  little money.

* Standardized: there are many implementations and they're mostly
  inter-operable.

* Supports off-line use. Not everyone can, or wants to, be online all
  the time.


## Problems with the existing email system

* Spam, or unsolicited bulk messages. Worse, anti-spam measures drive
  centralization, and are still ineffective, especially for those not
  using one of the huge centralized providers. End result: you either
  sacrifice privacy or you get tons of spam.

  - Spam is a result of the desirable feature that anyone can email
    anyone, combined with the fact that sending an email costs
    approximately nothing, even if you send millions of emails, and
    aggravated by the fact that spamming has de-facto no real financial
    or legal repercussions.

* Scam, or trying to convince people to do something that they shouldn't
  or that's harmful to them or others. There is no widely used method to
  digitally sign email, and thus criminals can easily fake messages to
  look like they come from, say, Amazon, Netflix, Paypal, or other
  companies the recipient is likely to use. Everyone needs to be
  constantly on their toes to avoid mistakes.

  - There are standards for digitally signed email, but they're not
    great, and the big providers of email software or service tend not
    to support them, or support them badly.

* Becoming centralized to a few huge providers. This is bad for
  privacy and can be catastrophic for security. If one of the big
  providers gets breached, up to hundreds of millions of people's
  communications are at risk.

  - We're moving towards a future where hosting email yourself makes
    you suspicious. It's already the case that it is difficult for a
    self-hosted email server to reach people on Gmail. Also, the rules
    keep changing.

  - Worse, all the big email service providers have track records of
    closing people's accounts with no warning. Sometimes this happens by
    accident, sometimes due to nefarious policies.

* No real privacy, even if you self-host. Email is by default in
  clear text, and while there are standards for encryption, they are
  not widely used, and tend to also leave metadata (headers)
  unencrypted. It's difficult to hide who is sending email to whom.

* HTML email is not well standardized, and is a security and privacy
  risk. Different email clients implement HTML in different ways, and
  the web standards are not followed very closely.

  - Worse, there's an assumption that HTML emails can embed images
    from the Internet, which results in more security problems (images
    are complicated data provided by a potential attacker, and just
    viewing them is a security risk) and privacy issues (the image
    hosting service will be notified when you view the email, see
    tracking pixels).

    There are moves to restrict this, but the problems have been known
    since HTML email was introduced, and the problems continue to exist.
    Some problems (such as embedding remote images) have gotten at least
    partial solutions, but the problems shouldn't exist at all.

* Attachments fill disks. Email is commonly used to share files,
  because it's easy and ubiquitous, even if it's not very good at it.
  There are services that make this better, but they are mostly
  proprietary, and require extra effort, are not ubiquitous, and people
  mostly don't use them routinely.

* There is no good support for group discussions. Massive dumps of
  forwarded discussions are commonplace in most large organizations.
  Mailing list managers exist, but they tend to be clunky, and tend to
  not be great for having discussions among large groups of people.
  They're better at sending out announcements and newsletters.

  - Email threads work, technically, but tend to result in surprisingly
    little communication happening, in the general case. People mix
    topics in threads, split the same topic into new threads, and
    generally don't use threads as intended. This is not the fault of
    people, but the technology.

* The technologies and standards for email are getting ridiculously
  complicated. Email was originally designed for relatively short
  messages in English only. To support non-English languages in a
  backwards compatible manner, email has gained whole extended
  families of ways to encode text and data: uuencode, base64,
  quoted-printable, and header encoding, to start with.

  The email tech stack is getting so hard to understand that it's
  difficult to even use, never mind implement correctly. Never mind
  that the complexity results in more effort to operate and keep
  the servers secure.


# The spam and scam problem

There are a large number of problems. Rather than attacking all of them
at once, let's consider them one at a time, and let's start with the
most obvious problem: spam. As a side effect, the solution proposed
below should also solve the scam problem.


## Problem statement

The spam problem can be stated as follows:

> Anyone can send email to anyone else. There is practically no cost
> to the sender for sending many emails. It's difficult for the
> recipients to filter unwanted mail away automatically, because it
> would require the computer to understand human communication as well
> as humans.

The scam problem can be stated as follows:

> Anyone can send email that looks like it comes from someone else, at
> least sufficiently well that an unobservant recipient is fooled. This
> can be used to con the recipient to click a link in the email that
> leads to a fake web shop, for example, or a site that attacks the
> recipient with malware.


## Overview of solution

* Every email user has one or more identities, represented by
  cryptographic keys.

* All email is digitally signed using the cryptographic keys.

* No email is delivered unless it carries a digital stamp issued by the
  recipient, or someone authorized to issue one on behalf of the
  recipient.

The idea for stamps comes to me from [@fedispam], who seems to have
gotten it from Christopher Lemmer Webber and Taler, and who knows where
it originated from.

[@godinstamps] also discusses his idea of paid stamps for email,
originally from late 1980s or early 1990s, with the idea that low
volumes are free, but after that you pay a "penny" per email. I'm not
really happy with the exact solution suggested, as it doesn't provide
the recipient from a way to avoid unsolicited junk from arriving, but
the podcast is part of a discussion.


## Digital identities

In this approach, each email user can have as many email identities as
they want, and each identity is represented by a key pair for public
key cryptography. The identities are not necessarily linked, just like
personal and work email addresses are not linked. However, they may be
linked, so that for example an identity used for open source
contributions may be linked to an identity used for publishing poetry
or for contributing to Wikipedia.

The key pair, consisting of a public and a private key, is used to
identify the email account and messages from the account. Every message
sent using an identity is signed with the key for that identity.

This means misrepresenting the sender becomes much harder, reducing the
possibility for scam.

Each identity (key pair) can have metadata associated with it, such as a
name. There can be digital signatures for the metadata for certifying
it, to avoid miscreants faking identities by creating new keys and
associating someone else's name on them. With the metadata signatures,
the recipient's email software can at least attempt to verify
correctness of the metadata.

Alternatively, names are handled only on the recipient's side. If I get
a message from you, and I'm sure it's from you, I can tell my email
address book that the key you used to sign the message should have your
name. If a miscreant creates a new key, my email software won't say it's
from you, and the miscreant has to convince me that it's you. (This
needs further thought.)


## Digital signatures

For the purposes of this discussion, assume a way to digitally sign
messages that covers the whole message, including its metadata. The
details of how that is achieved do not matter: digital signatures have
well-known, good solutions and since we are talking about a new system,
we don't have to be compatible with the problems of the existing email
system.

For this discussion, assume each message can be securely verified as
having been sent by its sender identity. If a message claims to be from
an identity, but its signature can't be verified, the message is
rejected by the recipient's email software.


## On encryption

To solve the problem of surveillance, email encryption is going to be
needed. The simple solution is to always encrypt everything.


## Digital stamps

A digital stamp is a digital token issued by a recipient which gives a
sender the capability to send one or more messages to the recipient.

A digital stamp is more powerful than a physical, paper stamp. Paper
stamps can be transferred (sold, given) without limit. A digital stamp,
however, allows more features:

* only the recipient can decide if it's still valid: the recipient can
  invalidate otherwise valid stamps

* digital stamps can have a complicated validity time: perhaps
  they're only valid for three months? or only on Mondays? or only
  during office hours? It's up to the recipient to decide.

* digital stamps may be indefinitely usable, or single-use: you might
  give someone new a stamp they can use only once, and if you don't give
  them another, longer-lived stamp, you won't get further email from
  them

  - for example, I might order a mug from an online shop and give them
    two single-use stamps: one for sending me the order confirmation and
    another for sending me a notification of shipment

* digital stamps may be valid only for a specific sender: I might issue
  a stamp to a shop and if they sell my contact information to a
  spammer, the spammer can't use the stamp to send me email; further, I
  will know the shop gave the stamp to the spammer

As an extra twist, digital stamps may also be an authorization to
someone else to issue stamps on your behalf. Rather than the stamp
allowing them to send you an email, it lets them create a stamp that
lets a third party send you an email. Your email software can put any
and all the constraints it puts on stamps you issue directly on the
delegation.

For example, if you and Alfred have a mutual friend, Bruce, you can give
Bruce a stamp that authorizes Bruce to issue single-use stamps to other
identities. If Bruce thinks you and Alfred should know each other, Bruce
can issue Alfred a stamp that lets Alfred send you a single email. If
you like Alfred, you can issue further stamps to Alfred.

An employer runs their own email server, and that server determines
which stamps it accepts. This lets an employer issue stamps on behalf of
each of their employees.


## Receiving email from strangers

In some cases it's important to be able to receive email from
strangers. A stranger here is someone to whom you've not given given a
digital stamp. Some examples of when this might be important:

* you're an open source developer and you wish to receive bug reports
  from strangers
* you work in a customer-facing role in a company and your customers
  need to be able to reach you
* you've saved a dog from a tree and journalists need to be able to
  reach to set up interviews
* someone you went to school with wants to congratulate you on your
  marriage, birthday, newborn child, or other life event
* a former co-worker wants to ask if you want a new job with their new
  employer

Some of these cases can be handled by not using email: bug reports can
go into a web-based ticketing system; customers can get a single-use
stamp whenever they pay their invoice; etc. However, there will always
be cases when you want email from people to whom you've not yet given
a stamp.

A mail server can, optionally, have a feature where it gives anyone a
single-use stamp tied to a specific sender identity. Unfortunately,
this could easily be abused by spammers: they'll automate the step of
requesting a stamp before sending the email. To counter that, the mail
server can impose conditions on giving out stamps:

* In the simplest case, the server might never give out stamps; this
  prevents spam at the cost of all desired email from strangers.
  Whether that's an acceptable compromise is up to each recipient.

* The server might require the putative sender to solve a [CAPTCHA][]
  of some kind. The CAPTCHA might be a puzzle that is infeasible to
  solve automatically.

* The server might require the sender to write a short sentence of why
  they want to reach the recipient. If that contains keywords chosen
  by the recipient, the server issues the stamp.

* The server might require some sort of [proof of work][]. This can be
  cheap enough that it doesn't matter for rare occasions, but
  expensive enough that a spammer would need to expend so much
  computing resources it becomes infeasible. (See also [@hashcash] for
  an early suggestion.)

* The server could require a very small payment. (This is troublesome
  in international communication, when "very small" is a irrelevant to
  someone working in a rich country, but a sizable fraction of the
  annual earnings of someone living in a poor country.)

[CAPTCHA]: https://en.wikipedia.org/wiki/CAPTCHA
[proof of work]: https://en.wikipedia.org/wiki/Proof_of_work

The issuing of stamps to strangers is optional, and is meant to be an
interactive process. There doesn't need to be a standard way to do
that, or even an enumerated set of standard ways. Each mail server,
even each recipient, can invent their own. Flexibility here is
important, as spammers will evolve ways to circumvent any common
methods.


# Identities and mail delivery

In the existing email system, your email _address_ is your identity.
The address basically specifies where to deliver email meant for you.
If you change email providers, the address changes. This makes no
sense: you're you even if you switch Google's mail service to
Microsoft's.

There are workarounds for this. One can set up automatic forwarding of
incoming email from the old address to the new one. One can arrange to
have one's own domain name, and arrange with one's email provider to
use that in the email address, instead of the provider's domain. These
workarounds work, but they add friction and cost.

A better way would be to separate the concepts: keep identity separate
from address, as a fundamental building block of the email system.
Here's my thinking:

* You, the person, can have any number of _identities_. You can keep
  your work persona separate from your school persona. You can further
  separate you as a spouse and a family member from your public
  persona as an open source developer, blogger, and so on. This is
  similar to having many email addresses, but it can be made to be a
  normal thing, and not involve any hassle.

  An identity is a random string, but it can have a number of
  attributes intended for people: names, photos, links to web pages,
  etc. The attributes are meant to help other people understand whose
  identity it is, and what role or aspect of that person the identity
  reflects. Attributes may carry certifications from others, and your
  email client will make it obvious if an attribute is certified by
  someone you trust or not.

* Each of your identities uses an _mail store_, which is where all the
  email for that identity is stored. The mail store may be on your
  computer, perhaps as part of your mail client, or it may be on a
  server somewhere. You may run the store yourself, or someone may run
  it for you, based on an agreement between you and them. The mail
  store has an address, and it can do things on your behalf,
  automatically, such as issue stamps, or forward the mail to one or
  more recipients.

* When email is sent, it's sent to an _mail drop_. The drop may be
  part of the same server where the store is, or it can be a separate
  server. The drop will accept emails based on stamps: a stamp will
  include an authorization to use that drop for a the intended
  recipient. You need to arrange with a drop that it accepts mail for
  you, and this gives you the authorization to include in your stamp.
  The arrangement also includes telling the drop what to do with an
  email it has accepted: how and whom to notify of its arrival. It
  further includes telling the drop how long to store an email.

  Drops will store the emails they accept until either a store or
  another drop fetches them, or they expire. When a drop sends a
  notification that an email has arrived, it can send it to a mail
  store or another mail drop.

## Confidentiality and authenticity

Each identity will have an encryption key, for public key
cryptography. All emails sent using that identity are digitally signed
using the key: this allows others to verify that an email actually
comes from a specific person, and that the mail hasn't been altered
along the way.

All email is also encrypted as well as signed. Everything that can be
encrypted, will be encrypted. This includes all message metadata
("headers" in the current system). If mail drops or stores need to add
metadata, they'll attach another encrypted part to the message.

All communication between mail clients, stores, and drops are
encrypted (a la HTTPS), and may occur over the Tor network. All server
components may be provided as Tor onion services, if the server
operator so chooses.

# Implementation thoughts

While I'm not going to be spending my free time to implement this, I
can't help having thoughts about how it would be done. Here is a loose
collection of some of them.

* Build on top of HTTPS. Avoid inventing new protocols at that level.
  I'd favor RESTful APIs, myself.

* Express message text as something similar to the Pandoc abstract
  syntax tree, rather than specifying a specific language, such as
  HTML or Markdown. Languages require trickier parsing. Having just
  one way to represent messages should simplify interoperability. The
  representation should avoid supporting arbitrary sender-provided
  code to run as that would a big security risk.

* The abstract syntax tree should have a way to explicitly mark text
  as a quote from another message, and that new text is a direct
  response to the quoted part.

* I'd favor building the encryption on top of OpenPGP, given it's a
  proven framework. I'm not a cryptographic engineer, though, so wiser
  heads need to do that work.

* Instead of large attachments, it may be preferable to have a mail
  store provide the attachments on demand, attaching an authorization
  token in the email for downloading the attachment instead. The
  recipient's mail store can do the downloading automatically for
  trusted senders.


# What next?

Do you think the solution proposed in this essay for spam and scam will
help? If not, why not? Can you see a way for a miscreant to circumvent
the proposed solution to get their unwanted message delivered to the
recipient?

Let me know, preferably via the legacy email system, as a response to
this [fediverse thread][], or using the [GitLab issue system][]. If
you want to propose improvements to the essay, feel free to file a
merge request or send patches.

[fediverse thread]: https://toot.liw.fi/@liw/103984861489499836
[GitLab issue system]: https://gitlab.com/larswirzenius/ideas/-/issues


# Other proposals to improve email

There have been a very large number of proposals to improve email over
the years. This is an incomplete list of them.

[@djbim2000] proposed "Internet Mail 2000" where the central idea is
that the sender stores the email, not the recipient. I'm not sure how
this would solve the spam problem. [@siebenmann2020] has an excellent
critique of the proposal.



# References