Looking for more podcasts? Tune in to the Salesforce Developer podcast to hear short and insightful stories for developers, from developers.
112. Managing Public Key Infrastructure within an Enterprise
Hosted by Robert Blumen.
The episode focuses on managing a certificate authority (CA) within an enterprise. The internal CA is compared on many points to PKI on the public internet.
Show notes
This episode features a conversation between Robert Blumen, DevOps engineer at Salesforce, and Matthew Myers, principal public key interface (PKI) engineer at Salesforce. Matthew shares his experience running a certification authority (CA) within the Salesforce enterprise. He shares the rationale for the decision to take CA in-house, explaining that becoming a certificate authority means you can become the master of your universe by establishing internal trust. A private or in-house CA can act in ways not dissimilar to a PKU but can issue its own certificates, trusted only by internal users and systems.
Using a public certificate authority can be expensive at scale, particularly for enterprises with millions (or even billions) of certificates. However, an enterprise CA can be an important cost-saving measure. It adds a granular level of control in certificate issuing, such as naming conventions and the overall lifecycle. You can effectively have as many CAs as you can afford to maintain as well as the ability to separate them by use case and environment.
Further, having the ability to control access to data and to verify the identities of people, systems, and devices in-house removes the cybersecurity challenges such as the recent SolarWinds supply chain attack. Matthew notes that Information within a PKI is potentially insecure “as the information gets disclosed to the internet and printed on the actual certificates which leave them vulnerable to experienced hackers.” Matthews shares the importance of onboarding and people management and the need to ensure staff doesn’t buy SSL certificates externally.
Myerss offers some thoughts for businesses considering the DIY route discussing the advantages and limitations of open source resources such as OpenSSL and Let's Encrypt. Identity mapping and tracking are particularly important as you’re giving certificates to people, systems, and services that will eventually expire. Matthew shares the benefits of a central identity store, its core features, and how it works in tandem with PKI infrastructure. There’s also the need to know how many certificates you have in the wild at any given time.
As a manager, the revocation infrastructure for PKI implementation means that you're inserting yourself in the middle of every single deal, because if you’re doing it correctly everything needs to validate that the certificates are genuine. When you have a real possibility of slowing down others’ connections, you want to ensure that your supporting infrastructure is positioned in such a way that you are providing those responses as quickly as possible. Network latency becomes a very real thing.
Auditability and the ability to trust a certificate authority are paramount. The service that creates and maintains a PKI should provide records of its development and usage so that an auditor or third party can evaluate it.
Links from this episode
Salesforce
Wikipedia page on Public Key Infrastructure
Wikipedia page on Certificate Authorities
OpenSSL
Let’s Encrypt
Transcript
Speaker 1: Hello, and welcome to Code[ish], an exploration of the lives of modern developers. Join us as we dive into topics like languages and frameworks, data and event-driven architectures, and individual and team productivity, all tailored to developers and engineering leaders. This episode is part of our Deeply Technical Series.
Robert Blumen: This is Robert Blumen. I am a DevOps engineer at Salesforce. My guest today is Matthew Myers. Matthew is a principal PKI engineer at Salesforce, and we will be talking about running a certificate authority within a single enterprise. Matthew, welcome to Code[ish].
Matthew Myers: Thank you for having me.
Robert Blumen: For the purpose of this podcast, we're going to assume that listeners are roughly familiar with PKI and certificate authorities. We're going to be focusing on running a certificate authority within a single business. I'm going to ask a two-part question. What is a enterprise CA, and why would an organization want to become a CA?
Matthew Myers:
In short version, a CA's going to be anything that's issuing certificates that's going to be trusted internally within organizational boundary. Not stuff that's going to be readily accessible or viewable or even trusted on the public internet. To kind of keep it short is why is somebody want to do that in the first place is basically trying to become the master of your own universe sort of thing. If you have internal resources, that means being able to trust each other, but you don't want that information anywhere accessible on the public internet. You want to be able to maintain control of things that trust each other. That's usually when you start looking at having internal certificate authority or internal PKI, so that you can actually establish internal trust. Right?
Matthew Myers: So if you have internal servers that you want to be able to trust each other, but do not want to trust another set of servers, internal certificate authorities is a good way to accomplish that. Providing proof of identity for individual node servers, people, services, all kinds of things.
Robert Blumen: You could tell the employees who operate these different services, "Everyone just take your credit card, go out to Verisign or GoDaddy and buy certificates there." Why would the company take this service in-house?
Matthew Myers: It's really comes onto a matter of control and privacy. So if you go take your credit card, you go to a public certificate authority and say, "Hey, I want one of these certificates." A, it could be really expensive if you're talking about any significant volume, and B, all of that information that is being put on those certificates in order to make them valid also gets disclosed to the internet. A lot of people know where to look. I mean, it gets printed on the certificates themselves. All the information gets logged to certificate transparency logs and the general volume of certificates, especially the larger an enterprise gets, the larger your certificate base gets and depending on what your business looks I mean, you could be talking about thousands, millions or even billions of certificates.
Matthew Myers: And paying somebody else on that first certificate basis to issue those certificates for you, aside from just being a huge volume for any public seeder trying to take on that could be extremely expensive even if you're looking at using something Let's Encrypt, which is a free public certificate authority. There's limitations there as well. Right?
Matthew Myers: So really Let's Encrypt can only give you a certificate for a public DNS entry that you can prove you're actually using at the time and for a large organization or an enterprise, a lot of that, those DNS entries are on the local network. There's no public exposure. So you can't just hit lots and grips and say, give me a certificate for this thing that Let's Encrypt can actually see. Right? So it gives you a way to manage a certificate, lifecycle yourself, instead of putting that reliance on somebody else, and also gives you a lot of flexibility on how you issue the certificates, how frequently you issue with names or naming conventions you put on them that may or may not align with standards that public CA's have to align with. And you can have absolute control over what actually does get issued and who it gets issued to and how long they have it and everything else that goes along with it.
Robert Blumen: You mentioned that the need for these certificates is you have resources within your enterprise, and they need to know who to trust, give some examples of the different kinds of resources that would need to have a trust relationship with other resources.
Matthew Myers: Like an internal exchange or an email server or something like that. Right? If you have your own on-prem or private cloud type of situation for like you have an email server and you have people connecting to it to get their email, you would to know that this is your email server right now. It's somebody else's. And if they have a public certificate on it, you can evaluate the certificate and try to determine that. But at the same time, if you control the trust on both sides, you can empirically prove that this is a connection that is say for these computers that connect to. So if you have multiple internal CDs. Right?
Matthew Myers: So when you do manage your own internal PKI, you can go in and you can effectively have as many CAs as you can afford to maintain and you can separate them by use case. You can separate them by environment. What you ultimately end up doing is creating trust boundaries within your own organization. So if you have, let's say, you have a network that does a lot of financial type of information. You want those systems to be able to talk to each other and trust each other, but you don't want them to be able to openly communicate with something that say is your end user environment. You want to keep those two things separate and have them in separate trust boundaries, if not completely separate network boundaries. And you don't want them to be able to communicate.
Matthew Myers: So you can establish trust in one area and all these systems can talk to each other, but they can't talk to anything else because when they tried to access those systems, they get a certificate that chains up to a different certificate authority that they don't have an explicit trust for. So the communication fails. So it's a good way to kind of separate things that should talk to each other from things that shouldn't.
Robert Blumen: What I understand there is that you could segment your organization into different trust domains, and then you would distribute the root certificates out on an as needed basis for a resources that belong to one or maybe more trust domains for the things they should trust, but not for everything. Is that correct?
Matthew Myers: Yeah. So with larger enterprises in particular and the larger they get, the more likely it is to happen that you have, you end up with multiple internal private routes. And each one of those root roots certificates would represent different trust boundaries or trust domains. And you could have overlap with anything. Right? But the idea in those cases is you have a certain number of systems or networks or whatever that are allowed to trust each other within those boundaries. And they know the ones that are not.
Matthew Myers: So you can end up with this almost an inception sort of thing, where you have micro networks or trust boundaries within larger trust boundaries, where things can trust each other, but not other things. And it can get complicated, but at a fairly basic level, maintaining independent, separate routes from each other allows you to have an explicit separation of trust between different systems, for whatever reasons there may be. Right?
Matthew Myers: But a lot of times like, okay, maybe a more practical example would be a test environment versus a production environment. A test environment is going to be a lot more looser with the things that are allowed to go on in there. And the assurance of the certificates by that very nature was going to be a lot lower.
Matthew Myers: So you could be a lot freer with how you issue certificates and manage certificates in that truck and they test environments, but you want to have a much higher level of assurance and much more strict means of getting certificates and managing certificates and production because you want to make sure that production is very valid and very you have a tight grip on what is what in production, whereas in tests, it might be a little bit looser. And you want to keep those two things separated. A root CA that you use for your test should not be distributed in a production environment.
Robert Blumen: I want to highlight a point as a home user of the public internet. My main interaction with PKI is it's going to ensure that if my browser says I'm on a certain website, that website has offered me a certificate that's been signed by an authority that I trust. Usually if you're willing to jump through some hoops, it would still let me visit a website that did not present at all certificate. So it's mainly something that gives confidence to the user. It sounds to me you're talking about many cases here which is software to software trust, where there's no way for the user to say, "Oh, I trust this anyway." It's very strict and it's computer to computer.
Matthew Myers: So for an internal PKI, you could have a lot of use cases and some of them can involve people. Some of them can involve systems. When you look at the vast bulk of the certificates, they get issued off of an internal PKI system or systems. A lot of those certificates end up going to towards system to system communication, where you do want to have some automatic validation in place had the system you're talking to is the one that you're supposed to be talking to.
Matthew Myers: And if you define them correctly, computers are pretty good at identifying is this actually who I'm supposed to be talking to? If it is great, let's just keep going and don't slow down. But you also have a lot of use cases that do involve people as well. But like internal websites, your HR portal, for example, if you're hosting it on prem could very easily have an internal certificate that's going to be viewed in the browser.
Matthew Myers: That that's still very much a, a thing, but there's not as much failure there because you could just easily put a public certificate in that space as long as the DNS is something that a public CA would issue too.
Matthew Myers: So it really depends on how you set those things up internally. But what I frequently see in the user space is more for authentication tokens, smart cards. Right? That's very much a user certificate, a human being gets that certificate and identifies them as a person. And a lot of smart card authentication still goes through a browser. So you could still end up getting all kinds of certificate errors and stuff. If something doesn't get set up correctly. In a situation where you have somebody that's issuing millions of certificates a year, I'd say at the bare minimum, 75% of those are probably going to be for system to system communications.
Matthew Myers: And sometimes this stuff isn't even across the network, it's within an app talking to itself or to another application or something. But a lot of these certificates are just used to verify that the thing that you're talking to is genuine and actually a part of your corporation or part of your network. And you tried to reduce the likelihood that you're talking to something that's been compromised. If it's got a legitimate certificate on it. There's always ways around that if you want to, but you're trying to increase the assurance that the thing that you're talking to is the thing that you want to be talking to.
Matthew Myers: And people have a brain of rattling around in their head, hopefully. And if something doesn't smell right, they can always stop. Systems don't necessarily have that. So you have to find ways to narrow their scope of what's acceptable and using certificates is a fairly good way to do that.
Robert Blumen: There's something which I've always thought was a hole in the way PKI works. If I'm the home user, I visit the website, it presents a certificate. My browser is okay with it. What that means is I had a certificate in my route store that could be changed back to one of the routes certificates in my route store. I listened to Security Now podcast, and they have a running joke about how there's now 500 routes certificates in Windows. And there's, some of them are from very small countries or agencies, government agencies, and very small countries. And there's not a lot in theory that would prevent the public bus authority of Ruritania from issuing a certificate for Salesforce.com.
Robert Blumen: Now, I understand there's been an attempt to patch that, but it seems to me in an enterprise, you could have somewhat more control. Is there a way of managing even if you have multiple CAs within your organization saying not only do you only trust this other server that can prevent you a certificate for which you have or its certificate, but it must be this particular route certificate.
Matthew Myers: You can. Absolutely. I mean, you can actually take it a step farther inside of a corporation and micromanage the public route certificates that you allow your systems to trust as well. But looking at internal private PKI, if you have multiple routes certificates, you ultimately have direct control over which systems get those route certificates and are allowed to establish trust. And it's actually a really trust or management is really super important when you're dealing with your own internal PKI, because that sounds cool.
Matthew Myers: So what you don't want to do is create a route and then take that route and just distribute to everything everywhere. Especially if you have multiples, because the biggest reason to have multiples is to have defined trust boundaries. So you want to be very clear about where these end up, but at the same time, you want to make sure that everything that needs it has it. Right?
Matthew Myers: It's almost a double-edged sword. Right? You have to be very purposeful in your distribution of the root certificates. And if everything is following TLA standards in and whatnot, you should really only have to distribute a root certificate, but what is commonly the cases people will set up a service or whatever, and they offer the leave certificate or the end into the certificate and they don't include the chain. And then you end up with a gap. Right? Especially if your internal PKI is a three tier PKI, or honestly, even with a two tier, if you don't include the issuing CAs or the middle intermediate CAs in your systems, don't have a way to identify that the certificate they're being presented turns up to the root.
Matthew Myers: So what you end up having to do is not just distribute the route, but you also have to distribute all the issuing CAs too. I like to try to avoid that when I can, because it creates a lot of administrative overhead to not just distribute a very long lived route, but then also having to manage the renewal and rotation of the individual issuing CAs and getting them distributed everywhere is kind of a challenge, but you can be very defined and who's allowed to trust what.
Matthew Myers: All right, so that you can have the separations and you can make sure that any browser, any service or anything that is within your corporation or whatever it to trust these certificates actually can. Just because you create a root certificate, doesn't mean anybody's going to trust it unless you actually do the due diligence and distributed amongst all the trust stores. And there's lots of ways that do that. It really kind of depends on what tools you have at your disposal, but trust store management is absolutely a significant part of effectively managing a PKI anywhere, whether it's public or private.
Robert Blumen: I'm going to now change directions have been talking about the workflows and the life cycle to get these certificates issued to these thousands, millions or billions of resources that need them. If we're talking these big numbers, it can't be some kind of manual process or it can't be entirely manual. How do these resources obtain certificates and how do they prove who they are to the CA.
Matthew Myers: That is a ever-evolving and oftentimes complicated thing to solve for? Sometimes I'm honestly a little bit jealous of public CAs and their ability to validate certificate requests for other people, because on the public internet is pretty simple. You either control something in DNS or at an organizational level, or you don't. There's standard rules around that for a public CAs, but for internal, it gets a lot more complicated. So ideally you want to have as little human involvement as you can, but at the same time, you have to be very cautious about the types of certificates that get issued because certificates can be very dangerous if they are issued too broadly in the case of a wildcard certificate which could be used for any number of things. And if it gets compromised can be equally as dangerous.
Matthew Myers: Yeah. Since there's no real rules about what kind of certificates you can issue internally, you also need to basically create a rule set saying that we're not going to issue an internal certificate that can be used to represent an actual third party somewhere else. I don't want to issue a certificate that says google.com on it internally unless we have a very specific reason to do so. Right? So you have to create some artificial constraints on who it gets issued. And sometimes that's done through white listing or whatever but when it comes to actually identifying who's who, when it comes to certificates.
Matthew Myers: If you have a central identity store that's really beneficial. Right? Because then your PKI infrastructure works in tandem with your identity store to correlate it. If somebody comes to me and says I'm this service or I'm this person that we can map it to somewhere else, active directory or whatever, and verify separately from the request that this person, our services is, who they say they are then gets as much easier at that point to give them that certificate.
Matthew Myers: In addition to that, you have requests authorities or RAs and the mix with CAs and everything else actually form the larger other PKI implementation that you're working with. And RAs all times will have a mapping to say that this identity is allowed to issue certificates for these fully qualified domain names or this sub domain or whatever. And you have an ownership mapping that says this thing is allowed to do this thing. And that usually involves some sort of initial onboarding process. To say, but I want to be able to issue certificates in this category. And then you can evaluate say, "Okay, and then going forward, it's just an automated process until something needs to change."
Matthew Myers: And other use cases with smart cards. When smart cards can be used to identify enterprise admins and that sort of thing, it's usually a much more manual process because you want to be absolutely sure that the person you're giving the smart card to is actually is who they say they are. Sometimes some of it actually come in physically to a help desk type of situation, give them a driver's license and say, "Hey, I really am me." They would take that see what they have on file and their HR system or their identity start whatever, verify, and then provision them smart card in hand, physically hand it to them.
Matthew Myers: There's also in the world of us and growth. They can put the acting protocol which is basically, if you prove you have ownership of this one point in DNS we just automatically give you the certificate, no more validation required. And that's becoming increasingly popular inside of private networks too because it's less overhead. If somebody has something within DNS on your internal network, and it's not a high risk area, like a Kubernetes cluster or something like that's going to be fairly ephemeral or short-lived or whatever. They could say you have that thing in DNS, here's your certificate. We'll see you again in 30 or 90 days or whatever for your next one. And then just churn through them.
Matthew Myers: So there's not a one size fits all answer for any of these things. Right? It's all going to depend on what is you're actually trying to accomplish but having some sort of identity mapping is super important because at the end of the day you're giving certificates to people, systems services, and those things have to expire.
Matthew Myers: And you need to have an owners and ownership there because as a PKI owner, it's really hard to be the responsible for the entire certificates lifecycle from request to issuance, to renewal without the counterpart being involved too. Because a lot of times you may not have access to our best certificates getting installed. So you still have to have a relationship with the other person and say, "Hey, look, your thing is going to expire a certain, if you don't take care of it, there's a very real likelihood we're going to either impact one of our customers. Another team that are aligned with or whatever." So it's a ever evolving process to say the least.
Robert Blumen: Suppose my team are launching a new product. The ABC service, I go to your group, we need a certificate. So that other software that accesses our API can validate us. You look up abc.salesforce.com. We have that sub domain. It's all good. But now didn't, we create the same problem that the DNS management, they had to be sure that we really own this product before they gave us DNS because somebody needs to do some validation. And maybe that means the requests that are made are made on company systems and the person has a company email or different validations that this thing is real.
Matthew Myers: You definitely have a chicken egg problem sometimes when it comes to validating that somebody can or should get certain certificates. And depending on how restricted the environment may be, money comes to the certificates that are being issued. Sometimes if the person who's trying to onboard for this particular certificate for this particular service has a valid identity for your organization. Sometimes you just can't take the word for it and say, "Look, the thing that they're asking for falls within our standards, so have at it." And that case, it becomes really important to make sure that your certificate standards are actually really clear about what is and what isn't allowed. Right?
Matthew Myers: So wildcards, for example, it's really common to have restrictions on who can get wildcards and for what purpose and how the keys are and all that stuff. And what you are willing to issue a certificate for what you're not willing to issue a certificate for. Like in the case of you don't want to issue certificates for third-party companies, if you're not doing some sort of SSL decryption, and you want the really clear that people shouldn't be asking you for those sort of certificates and just be up front about it, then you don't have to deal with it later. But yes, sometimes if the DNS is already there and you validate that the identity of the person making the request is legitimate, then it's really just a matter of just giving it to them.
Matthew Myers: They've already kind of past those initial barriers of I am who I say I am. And this thing that I'm asking for a certificate for already exists in the DNS record. So there's not a lot of additional validation that necessarily needs to be done in a general network setting. Now if you have a really high risk area that might be different I want to do some additional validation, but it's really going to depend on what's at risk with the certificates that you're issuing.
Robert Blumen: Most servers when they boot up there might be a flag or a setting that you're eating a config file says, here's where to look for the trust store. The trust store is a directory that would contain certificates. In these days where infrastructure is code and devops generally you'd be provisioning the systems and the infrastructure they run on through some kind of a terraform infrastructure's code. So that's one end of the process. Now, the other end is I went to PKI group. I said, we're launching the ABC server. You issue the certificates, how does a certificate to get from... You issue them? Do you email them to me? Do you put them somewhere? How do they get from where you issued them to where the server is booting up and it can find all the certificates that it needs to run?
Matthew Myers: The ideal solution is that you have an API somewhere where you're consumed the CSR that somebody generated and the private key should be created on the system as intended for. Because at the end of the day the biggest risk with any certificate, whether there is internal or external is the private key. So it doesn't always work out this way, but ideally if you're spending up an essence somewhere, no matter how ephemeral or long-term, maybe you generate the private key either on the system or in some sort of a secret vault or key vault that that system has access to. So that private key isn't just floating around between systems. And you definitely don't want to email the private key. Oh my God, please don't do that. I've had to revoke so many certificates because somebody has emailed the private key or they've put it in Slack or they've put it anywhere that it wasn't supposed to be.
Matthew Myers: I can't even begin to tell you how many times that's happened. But like I said, ideally, you should have some sort of an accessible API where people can either use a token or something to authenticate and say, here's my certificate requests, my CSR please apply this template to it or whatever, sign it and send it back and try to make that as streamlined as you possibly can. There's lots of ways I can end up looking over time. You can have a RA verify that can have an agent or something on one end that does some of that for them and tries to kind of smooth out that process or has a process to generate the key somewhere and retrieve it in a relatively secure way for the system that's intended for.
Matthew Myers: There's lots of ways that can take shape, but generally speaking, don't move private keys around. And then however you want to exchange that CSR and that public key is whatever works best for you. Like in a really small shop that you can totally email a CSR if you want it to just only email the private key. Larger enterprises trying to do that over email is just not be sustainable. Right? Especially in bar twice as a certificate.
Matthew Myers: So leveraging common enrollment protocols ACME or some sort of REST API, or if you want to go old school, you can do $skip. Don't recommend it, but you could. There's options there. So really depends on what your appetite for risk is at the end of the day. Depending on which protocols you want to support for certificate enrollment, but there's a lot of options, but trying to remove the human element as much as possible is ideal. Right ? So sometimes an agent, is there a good way to do that? Sometimes just relying on your system admins or developers to generate a key on their side and just ship the CSR What have you. But the more widely use a BK, I guess, environment, the more important it is to have some sort of automation place for systems at the very least
Robert Blumen: What that might look then is when you issue the private key, you might put it in some sort of a secure enclave or something Amazon key management server and the Terraform code would run with some authorization that it can extract that particular item from key management store while it's running and stick it in the location where the application expects to find it. So no person might ever touch that private key in the process of it getting where it needs to go.
Matthew Myers: Yeah. So if you can use a system that can closely associate a private key with a identity mapping of some sort, then having a common key management service or something along those lines for where I said some can go retrieve as key when it needs it and you use it for other purposes and then put it back when it's done and keep the person out of the loop would be ideal. This is definitely one of those things where plus hands in the cookie jar is a good thing.
Matthew Myers: But having said that there's varying degrees of what that looks like. And so some certificates have a much higher value than others. Right? So in the whole wild cart situation where you have a private key for a wildcard certificate or for a CA certificate or a code science certificate, or very high value private keys, general key management system may not provide adequate protections for the value of that private key, in which case using a HSM service where that the key is not going to be exportable and all the cryptographic operations have to happen within that HSM might be a better solution for the key in particular.
Matthew Myers: So a lot of times what you end up with is during an onboarding process, you have to evaluate what are these certificates actually going to be used for? What kind of services are they going to be protecting? And then make sure that whoever's actually going to be consuming that certificate or offering that certificate as part of their communications or signing operations or whatever, have the right protections in place for what the thing is actually going to be used for. Not every certificate that gets issued within an internal private PKI, it's just going to be for TILA. There's tons and tons of things that go on that are not just TILA certificates, that still are the responsibility for maintaining the PKI systems.
Robert Blumen: What sort of a lifespan are you using for these certificates?
Matthew Myers: So that varies as well. So generally speaking, if you have an end entity certificate a year or less is generally the recommended approach for that. CA certificates have to live longer by the very nature. Because you can't have a certificate authority issuing a certificate, or they sees its own validity service certificate authorities will be good for two, three, four, five, possibly longer, depending on what it is, years at a time. But the actual certificates and the environment that are floating around, getting us all places and facilitating connections or foundations or whatever are usually a year or less.
Matthew Myers: If it's something a Kubernetes cluster where things are constantly getting spun up and spin down, you don't want to issue something that's going to be good for a year because then there's a potential that you're going to have a certificate that's not actually associated with anything and the wild for much longer than it's needed.
Matthew Myers: So having a service that can offer ephemeral certificates or certificates that are only good for very, very short period of time, 24 hours, 48 hours a week, 30 days churning through those at a high rate ensures that when you have services that are very dynamic, that those certificates die off at a very quick rate, and you're not left with a bunch of valid certificates that you can't account for. You want to make sure that your inventory of valid certificates can be accounted for.
Matthew Myers: And you want to know this is how many certificates we have in the wild at any given time. And if your dynamical issue certificates, and you're not rapidly expiring those certificates, that's going to end up being a astronomically high volume of valid certificates that you won't be able to account for. And not being able to count for a certificates is. But you have to assume that they get compromised at some point because if you can't account for them, then you don't know.
Robert Blumen: What do you mean by, by can't account for them?
Matthew Myers: If you ask for a certificate and I give you a certificate and then you go and install it and you can decommission that system. And we cannot locate that the certificates actually being used anymore. Then the question becomes, "Well, you've still got eight months of validity left on the certificate where is it?" And if you don't, you can't say, "Well, it was on the system and this system was decommissioned." If you can't actually attest to what happened to it, then the assumption when it comes to certificates is if you don't know where it is and somebody else has it, and if somebody else has it, then it's compromised. And what does that actually mean?
Robert Blumen: That's great. That ties into something else I wanted to ask, which is within the enterprise where you can have more controls on things. Are you in a position where you can monitor at some degree, maybe every usage of a certificate or you put some requirements in place. Every system has to have something that phones home once a day and says, here's all the certificates that I'm still using or anything like that?
Matthew Myers: Sort of. Right. So it's almost impossible to have 100 monitoring for every certificate and every use case. We try, we always try to get to a hundred percent, but it's really difficult to do that because not everything is sitting on a network port. Not everything is sitting somewhere where an agent can scan the trust stores or the keys stores or whatever. But the idea is that in a relatively closed ecosystem, a lot of our companies are operating in. Even if it's in public cloud. You still have a defined IP space and defined port ranges for things are going to be living on. And you have scanning engines that go out and they say the little scan, I don't recommend scanning it the full 64,000 port range, because you'll never finish. Right?
Matthew Myers: But you identify common ports and you go out and you scan them and you say so here's all the certificates that we know that we issued. And they're there on these IP addresses and these ports and what have you. And then you can also have locally installed agents that do the same thing for certificates that you can't scan on the network. And they identify that they're on these systems and they're on these key stores. And then you basically have a huge compilation of serial numbers and issuers. They'll kind of map back to your RAs and CAS and say, "These are all the certificates that we know that we issued in. This is for it that we have proven that they're installed. And then here's all the ones we don't know about." And that hopefully as time goes on that list that you don't know about gets smaller, smaller, smaller, as you get better validation mechanisms in place to go determine where everything's at.
Matthew Myers: So never ending game of cat and mouse sort of thing. But those systems also have another side effect of being able to identify all the certificates that you didn't know about. Because even if you have policy to say, "Thou shall get all of your certificates from us." Not everybody's going to do that, you'll have a cell sign certificate, she'll have people that will take their corporate card and go out to GoDaddy and get some random certificate and sell somewhere, even though they shouldn't have, but they did. So it gives you a way to corral all the miscellaneous certificates and the environment, and try to make sure that you're always adhering to the same set of standards and principles and everything else.
Robert Blumen: So a little while ago you mentioned revocation, somebody pasted the public key into Slack, or it's considered to be compromised. Maybe you issued it. You don't know where it is. With the public internet, it struck me that the whole system wasn't really designed to enable vocation. So you have these ad-ons like CRLs and OCSP stapling. How do you revoke something in your enterprise CA?
Matthew Myers: So revocation in general is not a perfect system. Right? Which is why for internet based certificates, you have Google trying to push everybody away from verification and using some other means to prove the validity of certificates to different effect. But internally in environments where you have a little more control like the method for revoking something is still the same. If you want to revoke a certificate, you still take that Sarah number to the CA and say, and revoke, it creates a CRL. The CRL has to be posted on a web service somewhere and ironically not using HTTPS because if you put the certificate and the path of verification, then you have the very high potential to create this chicken egg situation where you can't check the validity of a certificate because you can't validate the certificate in the way of the CRL.
Robert Blumen: That's hilarious.
Matthew Myers: You try to avoid having certificates in your certificate lifecycle. At least when it comes to revocation anyway. But to kind of get to the point that you still have the same systems for revocation, for internal that you would on in public. And so you still have CRLs you still have OSPI responders. If you have OCS P responders, then you can enable service owners to have OCSP stapling. And there's this misconception that keep running into that. People think that OCSP stapling is the responsibility of the OCS peer responder. When it really isn't it's almost the opposite.
Matthew Myers: So like a service owner would query the OCSP responder on behalf of its own certificate. Get a signed response from the responder and then provide it to whoever's checking in, which is a super helpful thing to do by the way. It's awesome when people actually are able to do that because it reduces the load on the servers that you're hosting their services on to provide to the verification checks and everything else, but it's come makes everything faster. Right?
Matthew Myers: Because one of the things, when you're a manager, the revocation infrastructure for as part of a PKI implementation to keep in mind is that if you're doing it correctly, you're inserting yourself in the middle of every single deal is handshake that happens. Right? Because everything needs to validate that the certificates we're talking to are genuine. So when you have a very real possibility of slowing down other people's connections and attempt to validate that their connections are genuine, you want to make sure that your infrastructure that's actually supporting that is positioned in such a way that you are providing those responses as quickly as possible. Network latency becomes a very real thing at that point.
Matthew Myers: The whole five nines thing really comes into play. Right? Because if somebody is trying to check the revocation side, it's the first certificate and they can't, then there's this whole cascading series of things that could happen depending on how clients are configured and everything else. Connections can time out. If they're designed to fill open, they will eventually carry on as if nothing happened, but there was a tremendous slowdown. And if they're designed to fell closed, then stuff just stops working. And the more extreme cases, if you have CRLs that themselves are expired because for whatever reason, you didn't get a fresh zero posted advice enough. Everything is treated it's revoked and everything to stops working.
Matthew Myers: Is a very precarious position to be in. But at the same time, if you don't provide revocation services, you have absolutely no way to verify that the certificate that you are trusting in your connection is actually genuine and that you should trust it. For an internal PKI situation. It's really important to be able to validate those certificate that you're consuming is actually genuine. Assuming that you're using the certificate for more than just encryption on TLS. If that's all you're after then PKI has a lot of overhead just to do encryption for TLS.
Matthew Myers: But assuming that the identity of the thing that you're talking to is important to you then having revocation is also important. Going back to your question if I wake up tomorrow morning and I had another person has put a private key in the sock, again, for some reason, then we have to go revoke that certificate. But if there's no revocation mechanism in place, and there's no way for us to tell people that you cannot trust the certificate, especially in a large deployment. Right? If you're only dealing with a couple of hundred computers, you can probably go blacklist the certificate and say if you see the certificate do not trust it.
Matthew Myers: It's not that simple, but on the larger scale, it's impossible to do that. Especially if you have a CA compromise. It's almost impossible to rip and replace a CA quickly, but you could do, if you had revocation in place.
Robert Blumen: We're hearing more about supply chain attacks. There was one that came out within the last couple of weeks. It was called something package confusion. You mentioned code signing. Is that an important use case for PKI within enterprise?
Matthew Myers: Very much so, even if you don't take the internal PKI component of into effect, code signing by itself is really important to make sure that you do correctly. The private key management is really critical when it comes to that. So if you use the enter on PKI for it, that kind of gives you the flexibility to sign your own code and kind of eat your own dog food, so to speak. But you don't have to worry about somebody outside of your organization, trusting that code. Or maybe more specifically, you don't have to rely on somebody else's code signing for your own stuff.
Matthew Myers: I've seen situations where all of Microsoft's patches get resigned by a company because they only want to trust one code signing key and having to extend that trust to Microsoft, even though Microsoft has a fairly good track record, but even the Microsoft has had some fairly public mishap. Right? HP had some. There's lots of stuff in the news for co-signing keys or mishandled. And the consequences of that are really disastrous. Right? So whether you use internal PKI to do code signing, or you use a public code science certificate, having a really rock solid means of protecting those private keys is super important.
Matthew Myers: But like I said, using an internal PKI to support code signing, it gives you more granular control over who actually trusts that code. If you have point of sale systems or you can have code that gets deployed to a point of sale systems that is signed with one particular code signing key and if it's signed by any other code signing key is not valid. All right. It gives you some kind of granular control over that. And you have direct control over the CAs that issued it. If it needs to be revoked, there's no haggling with a public CA. You don't have to reach out to your customer and say, "So the thing happened and oops."
Matthew Myers: you can kind of internalize some of that a little bit and manage the risk a little bit more than you could if you had a public code signing certificate. Public co-signer certificates are more dangerous and their very existence because they are trusted by everybody especially if you... Microsoft, for example, has a reputation system that they have a code signing. The longer it's in the environment, the more trusted builds and the more this is openly trusted by everybody else.
Matthew Myers: If you have a public co-sign certificate that gets that level of assurance with everybody, and then it gets compromised it's not good. But if it's internal code signing certificate, that's completely within your wheelhouse to manage. You don't have to rely on files reputation system. You don't have to rely on mitigating the fall off from that outside of your own corporate network.
Robert Blumen: For the last question, I want to talk about how you build this thing. Start up two guys in a garage, they can take their credit card, go to GoDaddy some point companies, big enough. They need to start setting up these, are their existing, either opensource servers, or do the commercial CAs offer a hosted type solution to run in the enterprise, or do you have to build it all from scratch?
Matthew Myers: There's a lot of choices there. You can pretty much pick and choose how you want to approach that one. You could absolutely write something yourself if you want to like open SSL is a pretty good baseline for just doing some very rudimentary certificate stuff. You can absolutely build a CA with open SSL if you choose to do so. There's fairly full featured opensource products out there that have a lot of capability to them like a EGVCA is a good open source, CA platform that has a lot of components to it.
Matthew Myers: And there's also an enterprise version of that, that you can pay for support through, I believe it's prime key. If you wish. If you're a Microsoft house, you probably have access to HDCs, which is a component of Windows server that you can install with operators certificate authority that way. I think CloudFlare has got CF SSL. There's choices between paid versus open-source versus roll your own if you really want to.
Matthew Myers: I don't really recommend the roll your own approach because there's an awful lot of gotchas when it comes to certificates and RFCs and everything else about how certificates should be managed and operated and all the things that you can and can't do with certificates. There's a lot of very clear defined rules around that stuff. Personally, I'd much rather go with a solution where a lot of that's already been worked out than trying to do it myself. When it comes to creating a CA regardless of the software platform that you choose to run it on. A lot of the really critical functions come down to private key storage which I realize I've said it a few times already for different things, but it's really important for CAs.
Matthew Myers: You want to have an absolute rock solid means of saying this private key has not been tampered with it has not been compromised, has not been exposed in any way. The only thing that's allowed to interact with this private key is a certificate authority itself. And if you're doing an enterprise PKI for any sort of large scale effort, you're really want to invest in cyber systems at that point to make sure that you've got a good audible solid way to do that. And you also want to do it in such a way where nobody can really question how the private keys are created or how the CAs are managed or everything else.
Matthew Myers: So you have this kind of concept of crypto officers in courtrooms where you have to have like a bare minimum of a certain number of people to do anything. You want to avoid, any sort of collusion or I just went off by myself and made this thing, but so you have to trust me. When it comes to creating certificate authorities. You don't want to trust that just because somebody has said, you can trust me.
Matthew Myers: I guess I walked into the street and said, "Trust me, because I asked you to." You're not going to trust them. And you shouldn't, and it's with a certificate authority either. You should be able to ask for evidence of how this thing was created and whoever maintains a PKI should be able to provide the evidence to show that this is how this thing was done securely and audibly. And there's a track record of everything that was ever done with it. And that can all be evaluated by an auditor or a third party, or whoever to say that yes, there is a reasonable degree of assurance and this private key, or there isn't.
Matthew Myers: So creating a root CA on a laptop without an HSM that just gets stuck into a desk drawer somewhere that anybody can come and get to very, very low bar of assurance. Right? Because there's not a lot of protections there. Something that was created in for all intents and purposes like clean room with no electronic devices aside from the laptop and HSM. And I think it was locked up in a safe that takes three people to get into. It's a much higher bar.
Matthew Myers: Yeah. There's a lot more assurance and trust that can be placed into the certificate authority. So at the end of the day, it really depends on what you care about. If just having a certificate authority to issue certificates is all you're really concerned about then HSM is probably going to be more money than really worth investing in. But if you're trying to have high value certificates and have certificates that are trustworthy and that can actually mean something, then you're going to have to invest a fair amount of time, energy resources, and ultimately money, and making sure that that exists and that has provable.
Matthew Myers: It's like any of your public, luckily the trusted certificate authorities have a very rigorous process that they have to go through before their roots certificate ends up in your trust store for Chrome or Firefox or whatever. It has to be very explicit and very provable. They don't just take people's words for it.
Robert Blumen: Matthew, it's been a pleasure speaking with you. Thank you for speaking to Code[ish].
Matthew Myers: Thank you for having me.
Speaker 1: Thanks for joining us for this episode of the Code[ish] Podcast. Code[ish] is produced by Heroku, the easiest way to deploy, manage, and scale your applications in the cloud. If you'd to learn more about Code[ish] or any of Heroku's podcasts, please visit heroku.com/podcasts.
About code[ish]
A podcast brought to you by the developer advocate team at Heroku, exploring code, technology, tools, tips, and the life of the developer.
Hosted by
Robert Blumen
Lead DevOps Engineer, Salesforce
Robert Blumen is a dev ops engineer at Salesforce and podcast host for Code[ish] and for Software Engineering Radio.
More episodes from Code[ish]
118. Why Writing Matters for Engineers
Laura Fletcher, Wesley Beary, and Ian Varley
In this episode, Ian, Laura, and Wesley talk about the importance of communication skills, specifically writing, for people in technical roles. Ian calls writing the single most important meta skill you can have. And the good news is that...
117. Open Source with Jim Jagielski
Jim Jagielski and Alyssa Arvin
Jim Jagielski is the newest member of Salesforce’s Open Source Program Office, but he’s no newbie to open source. In this episode, he talks with Alyssa Arvin, Senior Program Manager for Open Source about his early explorations into open...
116. Success From Anywhere
Lisa Marshall and Greg Nokes
This episode of Codeish includes Greg Nokes, distinguished technical architect with Salesforce Heroku, and Lisa Marshall, Senior Vice President of TMP Innovation & Learning at Salesforce. Lisa manages a team within technology and product...