9 DevOps Interview Questions

Building on the interview questions for Engineering Manager, Senior Software Engineer, and Ruby on Rails Software Engineer positions, I have developed additional interview questions that are useful when interviewing DevOps and DevSecOps engineers.

Since DevOps is an IC role that works with many teams across an organization and can have varying types of responsibilities depending on the organization, the questions are related to those areas. My assumption with these questions is that you, as a DevOps manager or hiring manager, already will be asking about core DevOps skills such as whether the candidate knows what CI/CD means, the main services in the major cloud service providers, and how to use IaC and Configuration Management tools. In each question, you can also ask about observability and security, who should have access to what and what kinds of preventative alerts are needed.

We’re setting up a prototype sandbox environment to test a product feature with a prospective client, how should we do this?

For a B2B SaaS or company with enterprise customers, the sales team will often need a sandbox environment to test a new feature for a potential new client. Typically this is a manual process at first, and then later automated.

An experienced DevOps candidate will ask about automation, and will suggest using Terraform and Chef or Ansible or another Configuration Management tool in order to set up the environment. They should be able to outline the steps to do this, such as setting up a new machine image or an Ansible playbook to run to set up the machine.

Candidates should also ask about supporting infrastructure, such as cache and database infrastructure that are needed for the test environment.

How can we ensure our web app or site is working?

This question demonstrates whether a DevOps engineer candidate knows what an automated canary is and how the QA process for a deployment works and ask questions about that process. DevOps engineers need to at least know how to manually test, using tools such as curl or Yaak to hit API endpoints directly, or to use a browser inspector to check for JavaScript errors. Experienced DevOps candidates will also mention uptime monitoring services.

We’re setting up a new micro-service, how should we set up CI/CD?

Candidates should demonstrate that they understand continuous integration and continuous deployment by suggesting GitHub Actions, CircleCI or some other service provider. They should ask whether container images need to be created or if it’s an AWS Lambda (or equivalent) function-based codebase that needs to be deployed.

The candidate should demonstrate knowledge about their expertise in the particular cloud stack for a deployment, such as the container registry and the observability tools. The candidate should also mention Grafana, Prometheus, and other industry standard tools.

We are using a specific AWS/GCP/Azure technology, and we want to build a resilient architecture, how would you go about doing that?

Many organizations have a primary cloud service provider, such as AWS. For high availability and redundancy and resilience, it can be advantageous to load balance and service requests across cloud providers or to set up backups in case of downtime from the primary provider.

Asking this question allows the candidate to show whether they think about multiple regions within one service provider or across service providers.

A candidate will typically be well-versed in major cloud service provider, and will show that they understand what options there are within the provider for high availability. So even if not considering moving from AWS to Azure, or balancing a CDN across CloudFlare and AWS, it gives the candidate a chance to show they know what it takes to run a high-availability service.

A team needs a new database set up, how would you plan out this new infrastructure?

Every cloud-based software product starts with a database, whether it’s Postgres, MySQL, MongoDB, Firebase, or something else. Sometimes a new product or new micro-service is being built that will need its own database.

The DevOps engineer candidate should know what questions to ask before they dive into infrastructure planning, and should ask questions about read replicas, database instance types and database size, and about backups and snapshots. They should also ask about tear-down and upgrade situations, for example if the new micro-service moves into production do we need to upgrade the size and instance type of the database? How should we tear it down to re-allocate spend and technical resources?

What is your process for working with other teams?

Start with this initial question, “what is your process for working with other teams?” and have the DevOps engineer candidate apply it within a scenario:

  • A team needs urgent infrastructure work completed, apply your process to it.
  • The web app is suffering an outage, and a product manager and staff engineer have started a meeting. What steps would you take to resolve the issue, according to your process?
  • A team is planning out a project that is expected to start next quarter. You have already met with the tech lead who architected it. However, another team with similar infrastructure requirements is starting a project sooner than that.

The “process” is important because there will always be something coming up that may need the urgent attention of the DevOps team and there will be many tickets coming from internal teams. How will the candidate respond in those situations? How will they pause work they’re currently doing? Who will they communicate with to inform them of issues or of status updates?

You are on-call, and there is an urgent issue, what steps do you take first?

In many organizations, the DevOps team’s responsibilities includes SRE, Site Reliability Engineering. The team has an on-call rotation and is part of the first line of defense when there are issues that show up in production. The urgent issue can be specific to your product, or it can be general, such as email notifications or authentication systems not working, or a particular external API is failing.

The goal of this question is to understand how the candidate will handle the pressure of on-call. One of the first steps could be looping in managers with a status update, or posting a public status update for customers rather than diving right away into debugging. Another first step could be diagnosing and triaging and prioritizing the issue, and asking if there’s a way to patch it or roll a code or infrastructure change back and buying time for engineers to find a more permanent fix. In smaller organizations and startups, the first step could be debugging and posting updates as they go along.

After an urgent issue is resolved, what steps do you take next?

A follow-up question to the one above is, what happens the next day? The goal is to understand whether the DevOps engineer will write up notes, research ways to prevent the issue, create alerts to prevent the issue, and talk to other engineers to create a better process for the future.

The question is a bit free-form because it depends on the DevOps engineer. Each one knows what steps they would take based on their experience, and having it free-form lets you see how well it aligns with current processes. One DevOps engineer may focus mainly on the urgent issue and the technical details while another may focus more on preventative measures.

Within this question you can also ask about observability and systems monitoring.

We have a new build pipeline being set up, how do we handle secrets management?

This question digs into the security side of DevOps. A candidate should know that API tokens and secret keys should be stored outside of a Git repository, and should not be stored in plain-text, and should be stored in AWS Secrets Manager, 1Password, or other secrets management tools. They should be able to describe how those secrets are accessed in the build pipeline and how they’re accessed for the deployed application. Candidates should also ask about key rotation and different keys for different environments.