Distributed James Server — Architecture

This sections presents the Distributed James Server architecture.

Storage

In order to deliver its promises, the Distributed James Server leverages the following storage strategies:

Storage responsibilities for the Distributed James Server
  • Cassandra is used for metadata storage. Cassandra is efficient for a very high workload of small queries following a known pattern.

  • The blob store storage interface is responsible for storing potentially large binary data. For instance email bodies, headers or attachments. Different technologies can be used: cassandra, or S3 compatible Object Storage (S3 or Swift).

  • OpenSearch component empowers full text search on emails. It also enables querying data with unplanned access patterns. OpenSearch throughput do not however match the one of cassandra thus its use is avoided upon regular workloads.

  • RabbitMQ enables James nodes of a same cluster to collaborate together. It is used to implement connected protocols, notification patterns as well as distributed resilient work queues and mail queue.

  • Tika (optional) enables text extraction from attachments, thus improving full text search results.

  • SpamAssassin or Rspamd (optional) can be used for Spam detection and user feedback is supported.

This page further details Distributed James Server consistency model.

Protocols

The following protocols are supported and can be used to interact with the Distributed James Server:

  • SMTP

  • IMAP

  • WebAdmin REST Administration API

  • LMTP

  • POP3

The following protocols should be considered experimental:

  • JMAP (RFC-8620 &RFC-8621 specifications and known limitations of the James implementation are defined here)

  • ManagedSieve

Read more on implemented standards.

Topology

While it is perfectly possible to deploy homogeneous James instances, with the same configuration and thus the same protocols and the same responsibilities one might want to investigate in 'Specialized instances'.

Components

This section presents the various components of the Distributed James Server, providing context about their interactions, and about their implementations.

High level view

Here is a high level view of the various server components and their interactions:

Server components mobilized for SMTP & IMAP
  • The SMTP protocol receives a mail, and enqueue it on the MailQueue

  • The MailetContainer will start processing the mail Asynchronously and will take business decisions like storing the email locally in a user mailbox. The behaviour of the MailetContainer is highly customizable thanks to the Mailets and the Matcher composibility.

  • The Mailbox component is responsible of storing a user’s mails.

  • The user can use the IMAP or the JMAP protocol to retrieve and read his mails.

These components will be presented more in depth below.

Mail processing

Mail processing allows to take asynchronously business decisions on received emails.

Here are its components:

  • The spooler takes mail out of the mailQueue and executes mail processing within the mailet container.

  • The mailet container synchronously executes the user defined logic. This logic' is written through the use of `mailet, matcher and processor.

  • A mailet represents an action: mail modification, envelop modification, a side effect, or stop processing.

  • A matcher represents a condition to execute a mailet.

  • A processor is a flow of pair of matcher and mailet executed sequentially. The ToProcessor mailet is a goto instruction to start executing another processor

  • A mail repository allows storage of a mail as part of its processing. Standard configuration relies on the following mail repository:

    • cassandra://var/mail/error/ : unexpected errors that occurred during mail processing. Emails impacted by performance related exceptions, or logical bug within James code are typically stored here. These mails could be reprocessed once the cause of the error is fixed. The Mail.error field can help diagnose the issue. Correlation with logs can be achieved via the use of the Mail.name field.

    • cassandra://var/mail/address-error/ : mail addressed to a non-existing recipient of a handled local domain. These mails could be reprocessed once the user is created, for instance.

    • cassandra://var/mail/relay-denied/ : mail for whom relay was denied: missing authentication can, for instance, be a cause. In addition to prevent disasters upon miss configuration, an email review of this mail repository can help refine a host spammer blacklist.

    • cassandra://var/mail/rrt-error/ : runtime error upon Recipient Rewriting occurred. This is typically due to a loop.

Mail Queue

An email queue is a mandatory component of SMTP servers. It is a system that creates a queue of emails that are waiting to be processed for delivery. Email queuing is a form of Message Queuing – an asynchronous service-to-service communication. A message queue is meant to decouple a producing process from a consuming one. An email queue decouples email reception from email processing. It allows them to communicate without being connected. As such, the queued emails wait for processing until the recipient is available to receive them. As James is an Email Server, it also supports mail queue as well.

Why Mail Queue is necessary

You might often need to check mail queue to make sure all emails are delivered properly. At first, you need to know why email queues get clogged. Here are the two core reasons for that:

  • Exceeded volume of emails

Some mailbox providers enforce email rate limits on IP addresses. The limits are based on the sender reputation. If you exceeded this rate and queued too many emails, the delivery speed will decrease.

  • Spam-related issues

Another common reason is that your email has been busted by spam filters. The filters will let the emails gradually pass to analyze how the rest of the recipients react to the message. If there is slow progress, it’s okay. Your email campaign is being observed and assessed. If it’s stuck, there could be different reasons including the blockage of your IP address.

Why combining RabbitMQ, Object storage , Cassandra for MailQueue

  • RabbitMQ ensures the messaging function, and avoids polling.

  • Object Storage stores potentially large binary payload.

  • Cassandra enables administrative operations such as browsing, deleting using a time series which might require fine performance tuning (see Operating Cassandra documentation).

However, the current design do not implement delays. Delays allow to define the time a mail have to be living in the mail queue before being dequeued and is used for example for exponential wait delays upon remote delivery retries, or

Mailbox

Storage for emails belonging for users.

Metadata are stored in cassandra while headers, bodies and attachments are stored within the BlobStore.

Search index

Emails are indexed asynchronously in OpenSearch via the EventBus in order to empower advanced and fast email full text search.

Text extraction can be set up using Tika, allowing to extract the text from attachment, allowing to search your emails based on the attachment textual content. In such case, the OpenSearch indexer will call a Tika server prior indexing.

Quotas

Current Quotas of users are hold in a cassandra projection. Limitations can be defined via user, domain or globally.

Event Bus

Distributed James Server relies on an event bus system to enrich mailbox capabilities. Each operation performed on the mailbox will trigger related events, that can be processed asynchronously by potentially any James node on a distributed system.

Many different kind of events can be triggered during a mailbox operation, such as:

  • MailboxEvent: event related to an operation regarding a mailbox:

    • MailboxDeletion: a mailbox has been deleted

    • MailboxAdded: a mailbox has been added

    • MailboxRenamed: a mailbox has been renamed

    • MailboxACLUpdated: a mailbox got its rights and permissions updated

  • MessageEvent: event related to an operation regarding a message:

    • Added: messages have been added to a mailbox

    • Expunged: messages have been expunged from a mailbox

    • FlagsUpdated: messages had their flags updated

    • MessageMoveEvent: messages have been moved from a mailbox to another

  • QuotaUsageUpdatedEvent: event related to quota update

Mailbox listeners can register themselves on this event bus system to be called when an event is fired, allowing to do different kind of extra operations on the system, like:

  • Current quota calculation

  • Message indexation with OpenSearch

  • Mailbox annotations cleanup

  • Ham/spam reporting to Spam filtering system

Deleted Messages Vault

Deleted Messages Vault is an interesting feature that will help James users have a chance to:

  • retain users deleted messages for some time.

  • restore & export deleted messages by various criteria.

  • permanently delete some retained messages.

If the Deleted Messages Vault is enabled when users delete their mails, and by that we mean when they try to definitely delete them by emptying the trash, James will retain these mails into the Deleted Messages Vault, before an email or a mailbox is going to be deleted. And only administrators can interact with this component via WebAdmin REST APIs.

However, mails are not retained forever as you have to configure a retention period before using it (with one-year retention by default if not defined). It’s also possible to permanently delete a mail if needed.

Data

Storage for domains and users.

Domains are persisted in cassandra.

Users can be managed in cassandra, or via a LDAP (read only).

Recipient rewrite tables

Storage of Recipients Rewriting rules, in cassandra.

Mapping types

James allows using various mapping types for better expressing the intent of your address rewriting logic:

  • Domain mapping: Rewrites the domain of mail addresses. Use it for technical purposes, user will not be allowed to use the source in their FROM address headers. Domain mappings can be managed via the CLI and added via WebAdmin

  • Domain aliases: Rewrites the domain of mail addresses. Express the idea that both domains can be used inter-changeably. User will be allowed to use the source in their FROM address headers. Domain aliases can be managed via WebAdmin

  • Forwards: Replaces the source address by another one. Vehicles the intent of forwarding incoming mails to other users. Listing the forward source in the forward destinations keeps a local copy. User will not be allowed to use the source in their FROM address headers. Forward can be managed via WebAdmin

  • Groups: Replaces the source address by another one. Vehicles the intent of a group registration: group address will be swapped by group member addresses (Feature poor mailing list). User will not be allowed to use the source in their FROM address headers. Groups can be managed via WebAdmin

  • Aliases: Replaces the source address by another one. Represents user owned mail address, with which he can interact as if it was his main mail address. User will be allowed to use the source in their FROM address headers. Aliases can be managed via WebAdmin

  • Address mappings: Replaces the source address by another one.Use for technical purposes, this mapping type do not hold specific intent.Prefer using one of the above mapping types…​ User will not be allowed to use the source in their FROM address headers.Address mappings can be managed via the CLI or via WebAdmin

  • Regex mappings: Applies the regex on the supplied address.User will not be allowed to use the source in their FROM address headers.Regex mappings can be managed via the CLI or via WebAdmin

  • Error: Throws an error upon processing.User will not be allowed to use the source in their FROM address headers.Errors can be managed via the CLI

BlobStore

Stores potentially large binary data.

Mailbox component, Mail Queue component, Deleted Message Vault component relies on it.

Supported backends include S3 compatible ObjectStorage (Swift, S3 API).

Encryption can be configured on top of ObjectStorage.

Blobs can currently be deduplicated in order to reduce storage space. This means that two blobs with the same content will be stored one once.

The downside is that deletion is more complicated, and a garbage collection needs to be run. A first implementation based on bloom filters can be used and triggered using the WebAdmin REST API.

Task Manager

Allows to control and schedule long running tasks run by other components. Among other it enables scheduling, progress monitoring, cancellation of long running tasks.

Distributed James Server leverage a task manager using Event Sourcing and RabbitMQ for messaging.

Event sourcing

Event sourcing implementation for the Distributed James Server stores events in cassandra. It enables components to rely on event sourcing technics for taking decisions.

A short list of usage are:

  • Data leak prevention storage

  • JMAP filtering rules storage

  • Validation of the MailQueue configuration

  • Sending email warnings to user close to their quota

  • Implementation of the TaskManager