The problem with devops secret-management patterns

Popular configuration management systems such as Chef, Puppet and Ansible all have some variant of secret management solution included. These are the passwords, cryptographic keys and similar sensitive information that must be deployed to specific servers in a data-center environment while limiting access to other machines or even persons involved in operating the infrastructure.

The first two share a fundamental design flaw. They store long-term secrets as files encrypted using a symmetric key. When it is time to add a new secret or modify an existing one, the engineer responsible for introducing the change will decrypt the file using that key, make changes and reencrypt using the key. Depending on design, decryption during an actual deployment can happen locally on the engineer machine (as in the case of Ansible) or remotely on the server where those secrets are intended to be used.

This model is broken for two closely related reasons.

Coupling read and write-access

Changing secrets also implies being able to view existing ones. In other words, in order to add a secret to an encrypted store or modify the existing one (in keeping with best-practices, secrets are rotated periodically right?) requires knowledge of the same passphrase that also allows viewing the current collection of those secrets.

Under normal circumstances, “read-only” access is considered less sensitive than “write” access. But when it comes to managing secrets, this wisdom is inverted: being able to steal a secret by reading a file is usually more dangerous than being able to clobber the contents of that file without learning what existed before.**

Scaling problems

Shared passphrases do not scale well in a team context. “Three can keep a secret, if two of them are dead” said Benjamin Franklin. Imagine a team of 10 site-reliability engineers in charge of managing secrets. The encrypted data-bag or vault can be checked into a version control system such as git. But the passphrase encrypting it must be managed out of band. (If the passphrase itself were available at the same location, it becomes a case of locking the door and putting the key under the doormat.) That means coordinating a secret to be shared among multiple people. That creates two problems:

  • The attack surface of secrets is increased. An attacker need only compromise 1 of those 10 individuals to unlock all the data.
  • It increases the complexity of revoking access. Consider what happens when an employee with access to decrypt this file leaves the company. It is not enough to generate a new passphrase to reencrypt the file. Under worst-case assumptions, the actual secrets contained in that file were visible to that employee and could have been copied. At least some of the most sensitive ones (such as authentication keys to third-party services) may have to be assumed compromised and require rotation.

Separating read and write access

There is a simple solution to these problems. Instead of using symmetric cryptography, secret files can be encrypted using public-key cryptography. Every engineer has a copy of the public-key and can edit the file (or fragments of the file depending on format, as long as secret payloads are encrypted) to add/remove secrets. But the corresponding private-key required to decrypt the secrets does not have to be distributed. In fact, since these secrets are intended for distribution to servers in a data-center, the private-key can reside fully in the operational environment. Anyone could add secrets by editing a file on their laptop; but the results are decrypted and made available to machines only in the data-center.

If it turns out that one of the employees editing the file had a compromised laptop, the attacker can only observe secrets specifically added by that person. Similarly if that person leaves the company, only specific secrets he/she added need to be considered for rotation. Because they never had access to decrypt the entire file, remaining secrets were not accessible.

Case-study: Puppet

An example of getting this model right is Puppet encrypted hiera. Its original PGP-based approach to storing encrypted data is now deprecated. But there is an alternative that works by encrypting individual fields in YAML files called hiera-eyaml. By default it uses public-key encryption in PKCS7 format as implemented by OpenSSL. Downside is that it suffers from low-level problems, such as lack of integrity check on ciphertexts and no binding between keys/values.

Improving Chef encrypted data-bags

An equivalent approach was implemented by a former colleague at Airbnb for Chef. Chef data-bags define a series of key/value pairs. Encryption is applied at the level of individual items, covering only the value payload. The original Chef design was a case of amateur hour: version 0 used the same initialization vector for all CBC ciphertexts. Later versions fixed that problem and added an integrity check. Switching to asymmetric encryption allows for a clean-slate. Since the entire toolchain for editing as well as decrypting data-bags have to be replaced, there is no requirement to follow existing choice of cryptographic primitives.

On the other hand, it is still useful to apply encryption independently for each value, as opposed to on the entire file. That allows for distributed edits: secrets can be added or modified by different people without being able to see other secrets. It’s worth pointing out that this can introduce additional problems because ciphertexts are not bound strongly to the key. For example, one can move values around, pasting an encryption key into a field meant to hold a password or API key, resulting in the secret being used for an unexpected scenario. (Puppet hiera-eyaml has the same problem.) These can be addressed by including the key-name and other meta-data in the construction of the ciphertext; a natural solution is to use those attributes as additional data in an authenticated-encryption mode such as AES-GCM.

Changes to the edit process

With symmetric encryption, edits were  straightforward: all values are decrypted to create a plain data-bag written into a temporary file. This file can now be loaded in a favorite text editor, modified and saved. The updated contents are encrypted from scratch with the same key.

With asymmetric keys, this approach will not fly because existing values can not be decrypted. Instead the file is run through an intermediate parser to extract the structure, namely the sequence of keys defined, into a plain file which contains only blank values. Edits are made on this intermediate representation. New key/value pairs can be added, or new values can be specified for an existing key. These correspond to defining a new secret or updating an existing one, respectively. (Note that “update” in this context strictly means overwriting with a new secret; there is no mechanism to incrementally edit the previous secret in place.) After edits are complete, another utility merges the output with the original, encrypted data-bag. For keys that were not modified, existing ciphertexts are taken from the original data-bag. For new/updated keys, the plaintext values supplied during edit session are encrypted using the appropriate RSA public-key to create the new values.

Main benefit is that all of these steps can be executed by any engineer/SRE with commit access to the repository where these encrypted data-bags are maintained. Unlike the case of existing Chef data-bags, there is no secret-key to be shared with every team member who may someday need to update secrets.

A secondary benefit is that when files are updated, diff comparisons accurately reflect differences. In contrast, existing Chef symmetric encryption selects a random IV for each invocation. Simply decrypting and reencrypting a file with no changes will still result in every value being updated. (In principle the edit utility could have compensated for this by detecting changes and reusing ciphertexts when possible but Chef does not attempt that.) That means standards utilities for merging, resolving conflicts and cherry-picking work as before.

Changes to deployment

Since Chef does not natively grok the new format, deployment also requires an intermediate step to convert the asymmetrically-encrypted databags into plain databags. This step is performed on the final hop, on the server(s) where the encrypted data-bag is deployed and its contents are required. That is the only time when the RSA private key is used. It does not have to be available anywhere other than on the machines where the secrets themselves will be used.


** For completeness, it is possible under some circumstances to exploit write-access for disclosing secrets. For example, by surgically modifying parts of a cryptographic key and forcing the system perform operations with the corrupted key, one can learn information about the original (unaltered) secret. This class of attacks falls under the rubric differential fault analysis.

2 thoughts on “The problem with devops secret-management patterns

  1. Disclaimer: I am currently in no ways affiliated to Thales, SafeNet, Bull or (CardContacte/SmartCard-HSM).

    There is another way to control the entire security process via a smart card or HSM. The current commercially viable solution (and also rather low cost) is to purchase a SmartCard-HSM (

    The SmartCard-HSM has features like quorum based M/N key custodians. My idea would be to get the SmartCard-HSM to generate a master key and use a quorum sharing of M/N key custodians. The key custodians would enter a password to individually encrypt each key quorum share and outputs a key quorum encrypted file for the particular key custodian to keep. When a decision is made to load the master key to another SmartCard-HSM, the custodians would gather their encrypted key share file and enter their own passwords to decrypt their key shares and load their key shares into a master key.

    Regardless if it’s a symmetric or asymmetric key, as long as someone maintains a control over the key in some way and the execution is not done inside a HSM, it’s still a flawed process since HSMs are designed for that kind of security guarantee.

    For a live setup, the HSM can be placed into unattended mode where it can encrypt and decrypt files without interference for dynamic management of the vaults or secret bags or keychains (whatever it is named). Fair enough, because the HSM is in unattended mode, anyone who have hacked into the server with the HSM attached would be able to send legitimate request for decryption and that’s where a stronger guarantee of security is needed. For such a scenario, a cheap and low powered SmartCard-HSM would simply not cut the higher security requirements. What is needed is a HSM with programmable Secure Execution Environment where the company can develop an executable to load into the HSM and then execute security critical logic inside the secure confines of a full fledge FIPS and CC certified HSM.

    Namely, I am referring to Thales nCipher HSMs, SafeNet’s ProtectServer HSM or Bull’s Proteccio HSM which all have SEE capabilities with Thales nCipher’s HSM being the most mature amongst all the SEE HSM providers.

    In fact, all sensitive operations must be executed within an SEE HSM for safety and security instead of a common insecure environment if the security requirements require that kind of necessity and the operators are willing to spend that kind of money for that level of security.

    It boils down to how much security someone wants. If someone is really serious about security and have the means for it, they wouldn’t hesitate to use something better I presume.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s