Written by Kristian Lesko |
Identity management (IdM) lies at the core of security architecture and processes in most software companies. It is responsible for ensuring that only authorized actors are allowed access to valuable protected resources, including our customers' data.
For multiple years, our infrastructure relied on FreeIPA as the identity management system of choice. It governed user access to virtual machines through secure shell (SSH), facilitated third-party application logins, and served other use cases. However, several reasons led to its ultimate removal from our stack and replacement by other systems.
This article builds upon a previous blog post, which focused specifically on the rework of our SSH access management. Here, we document the broader motivations for our IdM changes, as well as the specific steps and approaches the infrastructure team at GoodData took during this complex migration endeavor.
The Old Approach
FreeIPA was introduced into GoodData in 2012 in response to the growing size of our infrastructure and the subsequent need for single sign-on, i.e., avoiding the need for users to authenticate separately against every internal service. With FreeIPA, engineers would simply need to authenticate once at the start of their day, obtaining a Kerberos ticket. All subsequent access to internal web applications would then reuse this ticket and be transparent to the user.
One of the primary use cases for FreeIPA in our stack was governing SSH logins to virtual machines; we covered this in more detail in the previous post. FreeIPA also provided useful components covering several other use cases. Notably, its NTP server provided an easy way of time synchronization for the enrolled virtual machines, while the bundled certificate infrastructure enabled SSL certificate management for our internal services.
However, as the years went on, multiple aspects of the IdM setup based on FreeIPA proved to be insufficient for effective operations.
What we lacked the most was integration between FreeIPA and many of the third-party systems we utilized. While internally deployed web applications using httpd as a frontend were easily extended with Kerberos-based authentication via FreeIPA, this did not hold true for many external services, which typically only provide the SAML or OIDC methods of authentication with identity providers.
The missing link between FreeIPA and many other applications we used meant that a unified approach to user management could not be adopted. This resulted in complicated processes related to user access management, as well as onboarding and offboarding. For instance, the team responsible for user management had to manually verify if a terminated employee had been provisioned in numerous applications, and manually remove their user account.
Operating the FreeIPA service itself also caused headaches for our infrastructure team on multiple occasions. We had replication set up between FreeIPA servers in multiple different locations, but it proved fragile in case a network issue was encountered. Additionally, the FreeIPA deployment itself presented a single point of failure; if a regression was introduced by our operations, most of the engineers would be rendered unable to log in to the services they needed.
All the aforementioned pain points ultimately led to a company-wide decision to migrate to Okta as the centralized IdM solution. This spelled an imminent end to our usage of FreeIPA, but at the same time, created many new challenges for the infrastructure team to resolve.
Most importantly, a direct one-to-one replacement of FreeIPA components by Okta would not be possible; for example, Okta doesn’t provide governance of SSH logins or SSL certificate management out of the box. Therefore, while we would transition to using Okta as the central directory of user accounts, we would have to figure out how exactly to employ various open-source tools and adopt different approaches for each specific use case we needed to replace.
We already covered the replacement for SSH login management in the previous post. Now, let's delve into how we approached the replacement of the other necessary use cases.
The one aspect of our IdM ecosystem that can be considered migrated in a “one-to-one” fashion was the authentication for internally deployed web applications. Where we previously authenticated users by httpd’s LDAP or Kerberos modules, we moved on to using the mod_auth_openidc module instead, without any large architectural changes being required.
Additionally, as implied above, a massive advantage of migrating to Okta was also the support for single sign-on into multiple third-party applications. No more manual management of user accounts in each app!
Replacing the infrastructure of SSL certificates used by our virtual machines, on the other hand, required more attention. With FreeIPA, every enrolled server could obtain an SSL certificate fairly easily using certmonger. However, this capability would no longer be available with Okta.
Before starting to seek an equivalent replacement, we took a step back and considered the exact use cases we had for certificates. We identified two distinct ways in which we utilized SSL certificates:
- protecting user-facing endpoints of internal web applications;
- machine-to-machine authentication between internal services.
For the former case, the solution was to use publicly trusted certificates issued by Let's Encrypt. We only needed to figure out the lifecycle and distribution of the certificates. In the end, we selected cert-manager running in our service Kubernetes cluster to handle obtaining the certificates (including DNS validation), and a simple CronJob to store the certificates in our HashiCorp Vault instance, where all the consuming machines can access them.
The Vault service also played a key role in replacing our machine-to-machine authentication mechanism; we opted for creating a private certificate authority (CA) to cover this use case. Since all of our virtual machines had already been integrated with Vault, it was then relatively simple for some servers to obtain a client SSL certificate and for other servers to verify it against Vault’s CA.
Replacing the FreeIPA-provided time server proved to be straightforward. Since most cloud providers provide their own time servers nowadays (for example, Amazon has the Time Sync Service), we simply redirected our NTP configuration to use these instead of FreeIPA.
With a suitable replacement for all of FreeIPA’s components identified, we focused on designing a rollout plan for the migration to Okta. We recognized that there would have to be a longer time window for the transition; we simply would not be able to migrate all of the users and application integrations from FreeIPA to Okta at the same time and without introducing an unacceptably long downtime for the employees.
To provide a bridge between Okta and FreeIPA, and to enable a smooth switch of one use case after another, we decided to introduce a synchronization mechanism between Okta and FreeIPA. We utilized our pre-existing internal tool called freeipa-manager for this purpose, which supported managing FreeIPA entities by their YAML representations stored in a Git repository. Initially, we created accounts for all users in Okta and then extended this tool by adding support for creating users based on a response from Okta API.
This transition period was not ideal for the employees, since they had to remember two separate passwords for both of our IdM systems, as well as needing to keep track of which applications were authenticated by which system. We focused heavily on cross-team communication to make this time as painless as possible for all the users involved.
All in all, our migration from FreeIPA to Okta took slightly less than two years, starting in early 2021 and ending with removing the FreeIPA servers themselves from our environment in the second half of 2022.
It was an immense learning experience that required wide cooperation between the company’s departments, as well as learning the various deeper internals of the authentication technologies involved. In retrospect, we can confidently conclude that this massive change was well worth the effort, bringing our infrastructure security and the related user experience to a higher level.
Written by Kristian Lesko |
Subscribe to our newsletter
Get your dose of interesting facts on analytics in your inbox every month.Subscribe