Integrated storage (Raft) backend

The Integrated Storage backend is used to persist Vault's data. Unlike other storage backends, Integrated Storage does not operate from a single source of data. Instead all the nodes in a Vault cluster will have a replicated copy of Vault's data. Data gets replicated across all the nodes via the Raft Consensus Algorithm.

High Availability – the Integrated Storage backend supports high availability.
HashiCorp Supported – the Integrated Storage backend is officially supported by HashiCorp.

storage "raft" {
  path = "/path/to/raft/data"
  node_id = "raft_node_1"
}
cluster_addr = "http://127.0.0.1:8201"

Note: When using the Integrated Storage backend, it is required to provide cluster_addr to indicate the address and port to be used for communication between the nodes in the Raft cluster.

Note: When using the Integrated Storage backend, a separate ha_storage backend cannot be declared.

Note: When using the Integrated Storage backend, it is strongly recommended to set disable_mlock to true, and to disable memory swapping on the system.

`raft` parameters

path (string: "") – The file system path where all the Vault data gets stored. This value can be overridden by setting the VAULT_RAFT_PATH environment variable.
node_id (string: "") - The identifier for the node in the Raft cluster. You can override node_id with the VAULT_RAFT_NODE_ID environment variable. When VAULT_RAFT_NODE_ID is unset, Vault assigns a random GUID during initialization and writes the value to data/node-id in the directory specified by the path parameter.
performance_multiplier (integer: 0) - An integer multiplier used by servers to scale key Raft timing parameters, where each increment translates to approximately 1 – 2 seconds of delay. For example, setting the multiplier to "3" translates to 3 – 6 seconds of total delay. Tuning the multiplier affects the time it takes Vault to detect leader failures and to perform leader elections, at the expense of requiring more network and CPU resources for better performance. Omitting this value or setting it to 0 uses default timing described below. Lower values are used to tighten timing and increase sensitivity while higher values relax timings and reduce sensitivity.

By default, Vault uses a balanced timing value of 5, which is suitable for most platforms and scenarios. You should only adjust the timing value when platform telemetry indicators that a change is needed or different timing is required due to the overall reliability your platform (network, etc.).

Setting the timing value to 1 configures Raft to its highest performance (lowest delay) mode. The maximum allowed value is 10.

trailing_logs (integer: 10000) - This controls how many log entries are left in the log store on disk after a snapshot is made. This should only be adjusted when followers cannot catch up to the leader due to a very large snapshot size and high write throughput causing log truncation before a snapshot can be fully installed. If you need to use this to recover a cluster, consider reducing write throughput or the amount of data stored on Vault. The default value is 10000 which is suitable for all normal workloads. The trailing_logs metric is not the same as max_trailing_logs.
snapshot_threshold (integer: 8192) - This controls the minimum number of Raft commit entries between snapshots that are saved to disk. This is a low-level parameter that should rarely need to be changed. Very busy clusters experiencing excessive disk IO may increase this value to reduce disk IO and minimize the chances of all servers taking snapshots at the same time. Increasing this trades off disk IO for disk space since the log will grow much larger and the space in the raft.db file can't be reclaimed till the next snapshot. Servers may take longer to recover from crashes or failover if this is increased significantly as more logs will need to be replayed.
snapshot_interval (integer: 120 seconds) - The snapshot interval controls how often Raft checks whether a snapshot operation is required. Raft randomly staggers snapshots between the configured interval and twice the configured interval to keep the entire cluster from performing a snapshot at once. The default snapshot interval is 120 seconds.
retry_join (list: []) - There can be one or more retry_join stanzas. When the Raft cluster is getting bootstrapped, if the connection details of all the nodes are known beforehand, then specifying this config stanzas enables the nodes to automatically join a Raft cluster. All the nodes would mention all other nodes that they could join using this config. When one of the nodes is initialized, it becomes the leader and all the other nodes will join the leader node to form the cluster. When using Shamir seal, the joined nodes will still need to be unsealed manually. See the section below that describes the parameters accepted by the retry_join stanza.
retry_join_as_non_voter (boolean: false) - If set, causes any retry_join config to join the Raft cluster as a non-voter. The node will not participate in the Raft quorum but will still receive the data replication stream, adding read scalability to a cluster. This option has the same effect as the -non-voter flag for the vault operator raft join command, but only affects voting status when joining via retry_join config. This setting can be overridden to true by setting the VAULT_RAFT_RETRY_JOIN_AS_NON_VOTER environment variable to any non-empty value. Only valid if there is at least one retry_join stanza.
max_entry_size (integer: 1048576) - This configures the maximum number of bytes for a Raft entry. It applies to both Put operations and transactions. Any put or transaction operation exceeding this configuration value will cause the respective operation to fail. Raft has a suggested max size of data in a Raft log entry. This is based on current architecture, default timing, etc. Integrated Storage also uses a chunk size that is the threshold used for breaking a large value into chunks. By default, the chunk size is the same as Raft's max size log entry. The default value for this configuration is 1048576 -- two times the chunking size.
- Note: This option corresponds to Consul's kv_max_value_size parameter for Vault clusters using a Consul storage backend. If you are migrating from Consul storage to Raft Integrated Storage, and have changed this value in Consul from its default to a value larger than the Integrated Storage default of 1MB, then you will need to make the same change in Vault's Integrated Storage config.
autopilot_reconcile_interval (string: "10s") - This is the interval after which autopilot will pick up any state changes. State change could mean multiple things; for example a newly joined voter node, initially added as non-voter to the Raft cluster by autopilot has successfully completed the stabilization period thereby qualifying for being promoted as a voter, a node that has become unhealthy and needs to be shown as such in the state API, a node has been marked as dead needing eviction from Raft configuration, etc.
autopilot_update_interval (string: "2s") - This is the interval after which autopilot will poll Vault for any updates to the information it cares about. This includes things like the autopilot configuration, current autopilot state, raft configuration, known servers, latest raft index, and stats for all the known servers. The information that autopilot receives will be used to calculate its next state.
autopilot_upgrade_version (string: "") - This is an optional string that, if provided, will be used reported to autopilot as Vault's version. This is then used by autopilot when it makes decisions regarding automated upgrades. If omitted, the version of Vault currently in use will be used. Note that this string must conform to Semantic Versioning. Use of this feature requires Vault Enterprise.
autopilot_redundancy_zone (string: "") - This is an optional string that specifies Vault's redundancy zone. This is reported to autopilot and is used to enhance scaling and resiliency. Use of this feature requires Vault Enterprise.

`retry_join` stanza

leader_api_addr (string: "") - Address of a possible leader node.
auto_join (string: "") - Cloud auto-join configuration, using go-discover syntax.
auto_join_scheme (string: "") - The optional URI protocol scheme for addresses discovered via auto-join. Available values are http or https.
auto_join_port (uint: "") - The optional port used for addressed discovered via auto-join.
leader_tls_servername (string: "") - The TLS server name to use when connecting with HTTPS. Should match one of the names in the DNS SANs of the remote server certificate. See also Integrated Storage and TLS
leader_ca_cert_file (string: "") - File path to the CA cert of the possible leader node.
leader_client_cert_file (string: "") - File path to the client certificate for the follower node to establish client authentication with the possible leader node.
leader_client_key_file (string: "") - File path to the client key for the follower node to establish client authentication with the possible leader node.
leader_ca_cert (string: "") - CA cert of the possible leader node.
leader_client_cert (string: "") - Client certificate for the follower node to establish client authentication with the possible leader node.
leader_client_key (string: "") - Client key for the follower node to establish client authentication with the possible leader node.

Each retry_join block may provide TLS certificates via file paths or as a single-line certificate string value with newlines delimited by \n, but not a combination of both. Each retry_join stanza may contain either a leader_api_addr value or a cloud auto_join configuration value, but not both. When an auto_join value is provided, Vault will automatically attempt to discover and resolve potential Raft leader addresses using go-discover. See the go-discover README for details on the format of the auto_join value.

By default, Vault will attempt to reach discovered peers using HTTPS and port 8200. Operators may override these through the auto_join_scheme and auto_join_port fields respectively.

Example Configuration:

storage "raft" {
  path    = "/Users/foo/raft/"
  node_id = "node1"

  retry_join {
    leader_api_addr = "http://127.0.0.2:8200"
    leader_ca_cert_file = "/path/to/ca1"
    leader_client_cert_file = "/path/to/client/cert1"
    leader_client_key_file = "/path/to/client/key1"
  }
  retry_join {
    leader_api_addr = "http://127.0.0.3:8200"
    leader_ca_cert_file = "/path/to/ca2"
    leader_client_cert_file = "/path/to/client/cert2"
    leader_client_key_file = "/path/to/client/key2"
  }
  retry_join {
    leader_api_addr = "http://127.0.0.4:8200"
    leader_ca_cert_file = "/path/to/ca3"
    leader_client_cert_file = "/path/to/client/cert3"
    leader_client_key_file = "/path/to/client/key3"
  }
  retry_join {
    auto_join = "provider=aws region=eu-west-1 tag_key=vault tag_value=... access_key_id=... secret_access_key=..."
  }
}

Tutorial

Refer to the Integrated Storage series of tutorials to learn more about implementing Vault using Integrated Storage.

Integrated storage (Raft) backend

raft parameters

retry_join stanza

Tutorial

`raft` parameters

`retry_join` stanza