How to Design a Break Glass Process in Privileged Account Management (PAM) Systems
Work with clients on a privileged account management (PAM) system design and you’ll start seeing first hand some challenges balancing availability with assurance. One of the reasons is the need for a “break glass” scenario for emergency access when normal access paths to the password vault break down.
Source: Seton.au.net emergency labels
You see, PAM is all about locking “root” or “admin” credentials up in a hardened vault and tightly controlling access to them so as to increase assurance. In fact, when we did an informal survey among security architects as to what was one of the best core defense techniques to protect an organization’s most critical systems from breach, PAM was one of the top solutions mentioned. But at the same time,PAM can be difficult to deploy.
PAM password vaults provide an extra layer of control over privileged administration and password policies, as well as detailed audit trails on privileged access. In addition to controlling the use, distribution and change of passwords in the vault, PAM solutions can also broker sessions to systems or databases so that the privileged user never even sees the passwords or credentials. But like anything that adds additional steps in an IT process, PAM systems (or the means of accessing them) could fail.
What if an administrator is not able to login to the password vault where administrative credentials to root or admin accounts are kept? Administrator would not be able to get the credentials, or a brokered session, to administer production systems. Inability to administer systems, such as web, database or application servers could eventually cause an outage of mission critical or customer-facing services. The security department which “sold” the PAM solution to operations would suffer serious embarrassment and loss of political capital in the organization.
Therefore, the security team in question is designing break glass processes for making sure administrators have a backup process to get credentials out of the vault in an emergency. The client’s current plan is to grant senior administrators a local account to “safes,” each containing a subset of the credentials in the password vault pertaining to a group of servers. If privileged users are locked out of the vault, the senior administrator can log in and get the credentials for them in all cases, even if (for example) the Active Directory domain is down and no one can log into the PAM system in the usual manner.
Of course, it would not be good if the local account was abused to get around the normal processes restricting privileged administration to a small number of trusted users, or to give privileged administrators access to systems they hadn’t been cleared for. It is important to monitor the PAM break glass accounts closely to ensure their appropriate use. Unfortunately, the PAM vendor in this scenario (CyberArk) doesn’t have the capability to be configured to generate an alert every time a specific account is used.
If an organization has a mature security information and event management (SIEM) system, it could collect the PAM system logs and make it easy for the security operations center (SOC) or designated staff to monitor the logs for broken glass, so to speak. The local accounts could be created using a special naming convention, e.g. “breakglass01” and so on. Manual or scripted queries of the logs could produce all events prefixed with the names of these accounts. The SIEM system could be configured with a monitoring dashboard, or with rules that correlate events related to the accounts with other context to generate alerts when desired. Unfortunately, none of these detective controls operate in anything like real time, as log collection, normalization and aggregation can be a slow process.
To improve on the relatively toothless detective controls, our client decided to try and create an additional step to authorize the break glass process using a ServiceNow ticketing system. Access to the break glass accounts would require going through a request process that checks for a valid emergency change ticket in ServiceNow. However, there are a few problems with this. In a scenario where privileged users couldn’t log into PAM due to an Active Directory issue, they couldn’t log into ServiceNow either in their environment. And…
The next part of the logic really made me think. That is, the way the client integrates PAM with the ticketing system fails open if their API can’t access ServiceNow to check the ticket. So you could have cases where attackers with access to the network could either request an emergency change ticket (with status pending approval) themselves, or contrive a denial of service (DOS) exploit on the ticketing system and – they’re in. The exception process for “fail open” wistfully calls for adding ticketing failure events to a stored queue prompting the SOC to generate “incident tickets” for further investigation. With a busy SOC, however, you have to wonder if that’s really going to happen. In real life – especially in cases where DOS attacks are underway – you have to assume a high level of entropy in the security organization as things fly apart in a cascading chain of failures…at least for awhile.
When things are going wrong and PAM procedures have to be bypassed it may represent no more risk than just the routine failures and outages that happen in an IT environment. But such problems could also be signs of an attacker sowing confusion. Clients should make the PAM and all processes surrounding it as robust as possible, but cater to availability as well as assurance because…well, you just have to. Get the PAM solution deployed recognizing it won’t be perfect at first. Plan on tightening up the break glass process flows when possible to make them more robust. Don’t fail too open!