top of page

The

Platformers

  • LinkedIn
  • X
  • YouTube
  • Slack
  • meetup.com

Kubernetes Fleet Management At Adobe

Writer's picture: Guy MenahemGuy Menahem

Updated: Feb 1



Managing a large-scale Kubernetes infrastructure across multiple clusters and environments is a challenging yet crucial task for organizations that rely on Kubernetes for their application deployments. At Adobe, this challenge is met with a combination of cutting-edge technologies, including Argo CD, which enables seamless scaling and orchestration of Kubernetes applications. In this post, we'll explore how Adobe effectively manages their Kubernetes fleet using best practices for deployment, scaling, and monitoring, drawing insights from the work of experts like Mike Tougeron, Lead Cloud Engineer at Adobe.


Why Application Sets and Webhooks Matter at Adobe

At Adobe, Kubernetes is not just used in one environment; it spans across multiple clusters, each serving different regions, teams, or environments. To manage such a vast landscape, Application Sets in Argo CD have proven invaluable. These allow Adobe's teams to define and deploy the same application across multiple clusters efficiently.

The ability to automate and scale application deployments is further enhanced by webhooks. Webhooks are used to trigger deployments and updates only when necessary, avoiding the constant polling or reconciliation of all objects. This helps Adobe maintain system performance while ensuring timely and controlled application rollouts.

Post-sync hooks are another essential component in Adobe's Kubernetes fleet management. These hooks ensure that applications are validated after deployment, providing developers with immediate feedback on their changes. If something goes wrong during deployment, post-sync hooks allow for rapid identification of issues, enabling quick fixes before the changes impact the wider system.


Real-world Example: Managing Fluent Bit Deployments at Adobe

To see how these tools come into play, let’s look at how Adobe deploys a critical component, Fluent Bit, using Argo CD.

  1. PR Approval & Merging: Adobe’s CI/CD pipeline is fully integrated with Argo CD, allowing developers to initiate pull requests (PRs) for application updates. When a change request is made, such as updating the Fluent Bit Helm chart, it goes through a review process and is automatically merged once approved.

  2. Application Set Propagation: After the PR is merged, the application update is propagated across all clusters through an Application Set. This ensures that Fluent Bit is updated not just in one cluster but across multiple environments that rely on this service.

  3. Webhook Trigger: A webhook notifies Argo CD that the change is ready to be synchronized. This webhook triggers the synchronization process, ensuring that the deployment is only processed when necessary—avoiding unnecessary loads on repositories and artifacts.

  4. Post-sync Hooks & Monitoring: Once the Fluent Bit update is deployed, Argo CD runs post-sync hooks to validate the deployment. If anything fails, these hooks provide detailed insights into the failure, allowing the team to troubleshoot the issue quickly. Monitoring tools integrated with Argo CD then provide visibility into the health of the deployment.


Managing Large-Scale Deployments Across Multiple Regions

As Adobe's infrastructure grows, so does the complexity of managing Kubernetes clusters across regions and cloud environments. One of the main challenges is ensuring that different environments, such as production and staging, remain isolated from one another while allowing for easy updates and scalability.

Argo CD helps manage these deployments by using multiple management clusters, ensuring that the staging environment remains unaffected by any issues in the production environment. This level of isolation is crucial for large organizations like Adobe, where downtime in one region can impact customers worldwide.

By leveraging this multi-cluster architecture, Adobe ensures that updates are applied safely, and teams can easily test changes without risking disruptions in production.


Overcoming API Rate Limits and Controller Load

Another challenge in managing Kubernetes at scale is handling API rate limits and preventing excessive load on controllers. Adobe tackles this by isolating the resources that Argo CD monitors, using namespaces and management clusters to reduce unnecessary API calls and optimize system performance.

Additionally, Argo CD’s efficient deployment strategies—along with regional redundancy for management clusters—help Adobe avoid controller overloads. This ensures that the application synchronization process runs smoothly, even under heavy load.


Metrics and Dashboards for Post-deployment Monitoring

At Adobe, real-time monitoring and metrics are essential for ensuring the health of applications across clusters. After a deployment, metrics from post-sync hooks are collected and displayed on dashboards. These provide visibility into the status of each application, allowing teams to quickly identify failures and address them proactively.

Dashboards also enable Adobe’s teams to visualize long-term trends, such as the success rate of deployments, performance metrics, and resource usage. This data is invaluable for maintaining high availability and ensuring the smooth operation of their Kubernetes fleet.


Key Takeaways from Adobe's Kubernetes Fleet Management Approach

  1. Application Sets enable Adobe to deploy applications consistently across multiple clusters, ensuring efficient scaling.

  2. Webhooks trigger deployments only when necessary, reducing unnecessary load and enhancing performance.

  3. Post-sync hooks validate deployments after they are synced, providing rapid feedback and reducing the time it takes to resolve issues.

  4. Multiple management clusters help Adobe isolate production and non-production environments, enabling safer testing and more controlled rollouts.

  5. API rate limits and controller load management are optimized by isolating resources and using regional redundancy for management clusters.

  6. Metrics and dashboards provide real-time visibility into deployment health, making it easier to spot and resolve issues before they affect users.


By combining Argo CD with best practices for scaling, automation, and monitoring, Adobe ensures that their Kubernetes fleet remains reliable, secure, and efficient. Their approach to fleet management is a model for other organizations looking to scale Kubernetes infrastructure without sacrificing performance or operational agility.

 
 
 

Comments


bottom of page