Based on discussions at various meetups I've been to, secrets management remains an issue for many organizations; especially in relation to automation technologies. The problem is that there is no clear path to accomplish the storage of secrets, much less the method by which those secrets can be accessed by a machine.
My goal in this post is not to debate what tools should or should not be used. I am going to write from the perspective of an organization that has chosen Chef as it's automation platform, and HashiCorp Vault as it's secrets storage platform. These two technologies are widely used, and ones that I have experience with. Rather, I want to explore one path to which secure automated access might be accomplished.
I want to define a few terms that will be used in this post:
- Machine: a general term for a system or process that acts autonimously, possibly a virtual server or a configuration manager
- Secret: something that should remain unknown except within the context of it's usage by a machine or machines working together
- Node: some system being managed by Chef
Finally, I'd like to stress that the ideas and opinions expressed herein are just mine, and so may not be compliant with specific organizations rules and regulations.
Human != Machine
Human access management is fairly well understood, and organizations generally don't have much of an issue with this, unless a human goes rogue without warning. Machines are different in that their brains don't work the same as ours. A human can conjure a password, enter it twice into a form, go to sleep, and still get up the next morning and recall that password without having physically recorded it anywhere. Machines, on the other hand, need that information provided to them via some medium that can be read into it's memory the first time, and usually after a reset. The problem, then, is how to provide that secret via a method that is capable of being automated but not insecure, in case the machine is compromised by one of those pesky humans up to no good. For example, I don't want to put the access key to the secrets directly in a file on the machines hard drive.
The issue that plagues everyone is how to accomplish this balancing act. How can I provide a reliable method for getting the access key to sensitive information without making it too simple for a hacker to gain access to them as well?
Turtles All The Way Down
It's all about the turtles, or layers of regression that lead to the access key. It's turtles all the way down to that last one that holds the key. Enough turtles must be stacked such that the key is reasonably safe. I can add more and more turtles to get "security through obscurity", but what is the right amount of turtles? Something has to be provided to the machine as a starting point, but what? Which turtle is far enough from the final turtle to be safe for storage on the machine? How do I separate the turtles such that access to one machine is not sufficient to gain access to the secrets store?
The Key Turtle
Lets work backwards from the secret to successful machine access. At this point I am going clarify some assumptions of this post:
- The secrets backend I am using is HashiCorp Vault (HCV)
- The AppRole authentication method is being used
- Each node is assigned an AppRole of it's own, or as a group of similar machines
- Policies are present in the AppRole that provide the minimal access required
- Configuration is managed via Chef, which must access the node's secrets in an automated fashion
I'd like to point out a setting on the AppRole definition that can create a small amount of security, which is CIDR bound constraints. An AppRole can be set to only allow a set of nodes, or a single node, access based on IP address. This is not enough, of course. And since knowledge of this is not necessary for the machine, I do not name it as a turtle, but it leads us to the first turtle: Key Turtle.
The Key Turtle is the layer of security that allows access to the AppRole. There are two modes available with the AppRole authentication backend. The one that I'll be looking at here is the "pull" mode, which is the recommendation from the HCV docs.
In pull mode, the Key Turtle is the token that allows a node to query the role_id (this is the unique identifier used internally by HCV) and generate the secret_id. Once known, these two things will allow the node to log into the AppRole, retrieve a token with the policies defined in the AppRole policies list, and use this final token to gain the necessary access. Note that unlike straight token authentication, the initial token, represented here by Key Turtle, does not allow direct access to the secrets. Rather it provides a path to generate another token that allows that access. I attempt to explain that process more fully here.
The Key Turtle could be placed directly onto the node, and so be available to any process running thereon with the correct permissions. That would leave us with only one turtle, and if I trusted the node, that would be sufficient. But of course I do not trust the node. Bad people want to break into the node, and when they do, the Key Turtle will just be sitting there ready to be used. So, I have to protect Key Turtle in some manner.
In order to protect the Key Turtle, I need to keep it from being stored as plain text on the node. Since I have determined that Chef is the configuration manager for the node, and that I am trying to define how Chef will work with the node to supply the necessary secrets at converge time, I might think about putting Key Turtle into Chef. Assuming I know not to put Key Turtle into cookbook code, there are two places that might make sense.
One place is within node attributes. Placing Key Turtle here is not a good idea. If the node that is compromised happens to be the Chef server or an administrators workstation, then every node's Key Turtle could be readable by the attacker. Any secret available to any node would then be available them.
Another place that the Key Turtle could be stored is within databags. This is super bad, because if any one node is compromised, then that node can see every other node's Key Turtle, due to the fact that data bags are global to all nodes. Given that the weakest node in any organization may also be the node to which the least attention is given, this access could go unnoticed for quite a while. Even if the AppRole has been CIDR bound to a single IP, the attacker will just horde the data until they have compromised the other node. If every node has access in HCV to something like an administrator password, all I have done is made the hackers job easier. This is also the reason that databags should be used sparingly, and never with any type of sensitive data, or for configuration that can break lots of things.
The Encrypted Turtle
So then what can I resonably do to protect Key Turtle? Well, I can encrypt Key Turtle, to get Encrypted Turtle. The Encrypted Turtle is better protection, because it hides the true value of Key Turtle. Access to Encrypted Turtle alone will not allow any attacker access to the node's available secrets, because Key Turtle is now hidden from view, and HCV will not accepted the unencrypted token for access.
One way to generate Encrypted Turtle is via the transit secrets engine of HCV. The transit backend encrypts the given base64 encoded string (the token in this case) and provides an encrypted string which can be decrypted only by the same key ring used to encrypt it. Access to decryption is provided via a policy assigned to another token.
YAY! I have successfully secured the secrets. Right? Well, unfortunately I have done so at the cost of blocking the node from it's own access if I do not provide the necessary bits to do the decryption also. If I place that information along side the encrypted string then I have really done nothing to protect the system. I must keep the two pieces separated in some way.
The Decrypt Turtle
Decrypt Turtle represents the information from the transit secrets engine that will allow the decryption of Encrypted Turtle.
To be clear, each node's (or group of nodes) Key Turtle is encrypted with a different key ring in the transit secrets engine. Also, each node will have a token who's access is limited to decrypting only it's own Key Turtle. The Decrypt Turtle represents the key ring path and a token that will allow Encrypted Turtle to be decrypted, revealing Key Turtle.
This is the turtle I am going to provide to my node to start the process of gaining access to secrets. Since Encrypted Turtle and Decrypt Turtle are useless on their own but powerful together, I need to keep them separated. I can now do so in a relatively safe way.
One method of making these two pieces available to the node and Chef, yet without leaving all the pieces lying in one place, is to separate them between the two machines. For example, I could put Encrypted Turtle into the node's Chef attributes. Since it's the encrypted form of the access key, knowledge of it is not useful without the information to allow decryption.
Next, Decrypt Turtle could be stored in a directory on the node. If the node is compromised, and the contents of the directory made available for the attacker to read, the information is still not useful by itself. It's only the information to decrypt a string which the attacker does now know.
In this scenario, the two machines (the node and Chef) have to be working together in order to gain access to Key Turtle.
But what about Decrypt Turtle? That's also sensitive information. Shouldn't I protect that? Sure, the Decrypt Turtle could be encrypted somehow or placed in some third parties hands, but how do I present the method to gain access to that for the node or Chef?
Here I can start layering more and more turtles. However, in the end, the node must have access to something that will eventually lead to it's Key Turtle. No matter how deeply I bury Key Turtle, the path to it must exist. I then come to the realization that in an automation scenario, this path has to be in code, because human intervention is not available. I also have to accept that a system could be so thoroughly compromised that an attacker would gain administrative privileges to perform any and all tasks. In that situation, access to information about data on disk and in memory are available to the attacker. No matter how much obfuscation is used (no matter how deeply I stack my turtles), the attacker will have the starting bits, and so be able to work backward via the code trail to Key Turtle.
So I must eventually come to an outside solution for more security. For instance, access modeling could be used to determine if a node is requesting secrets in a fashion that is outside of the norm, and it's access revoked if it behaves suspiciously. Intrusion detection systems can be tied into the process, so that a node which is suspected to have been compromised can have it's access revoked, making any knowledge of it's Key Turtle of no use to the attacker. HCV has a seal function that can be called, which will seal the Vault off from all access in the case of extreme emergency. Any or all of these methods can be used to supplement the steps already mentioned in this post, in order to create a more secure environment for automated access to secrets.
The safest machines are those that are not powered on. The second safest are those which are well monitored.
Rotate The Turtles
Implementing some or all of the steps above may lead to a relatively secure access path for the machines, while possibly automated revocation of access based on rich monitoring, can stop nefarious access quickly. However, I can also rotate the turtles to bring even more peace of mind. Here are a few quick thoughts on turtle rotation:
- The pull mode of AppRole provides a unique secret_id each time it's requested by a token with the right privileges (remember, this token is Key Turtle here). That secret_id can be set to a single use, so rotation is automatic. That's also why it was not a named turtle in this post. If an organization choses to use the push model, Key Turtle becomes the custom defined secret_id, the rest of the flow remains the same.
- Key turtle can be rotated by a trusted system which only has access to create another Key Turtle for a given node, and revoke the old Key Turtle (this rotation needs some thought put into it, and could be the subject of it's own post). The same system could then create Encrypted Turtle, and update it in the node's Chef attributes.
- The Decrypt Turtle can be rotated in a few ways, as there are several things in play here. First, the entire key chain used to do the encryption can be switched out. Also, the same key chain can be kept, but the key rotated (this is one of the available functions built into the transit secrets engine). Finally, the token with the privileges to decrypt can be changed. This token rotation, similar to the Key Turtle rotation, would require thought in designing a secure rotation system. Of course the values for key chain or token will have to be updated to the node outside of a standard Chef client run, because I want to keep them all separated (chef-run maybe?).
The interesting thing about all of these rotations is that you could conceivably implement a solution that rotates all the turtles at any interval, without interfering with the Chef client. The rotations could conceivably be set from never all the way to every time the Chef client runs on a node, or more. A further benefit is that you could define alerting via the monitoring systems, to watch for the attempted use of any old turtle. If an old token, key ring, or secret_id is tried, that would be an indicator of foul play. Of course this goes back into the thoughtful implementation of said rotations and monitoring schemes.
I think it is important to stress that a machine should never be allowed to rotate it's own turtles. A machine should not be allowed to make it's own keys, update secrets, or otherwise gain write privileges to things that it does not absolutely require, in case it is compromised.
The Closing Turtle
I want to say again that this is just something I've been mulling over. In the end, organizations and individuals should implement the right thing for their particular environments and regulations. If more turtles makes things better, then definitely make more turtles. If less turtles are needed, use less. The important thing is that the secrets are as safe as they need to be, and remain accessible by machines without human intervention as much as possible.
Thank you for reading, and be sure to let me know if you find any issues in this post. I am working on an example of this type of solution as a proof of concept for myself, so I will update here when it's available.