Pipelines Patterns and Practices - Agents and Agent Pools

This Patterns and Practices document describes the Azure DevOps universal agent and how it is organized into pools for use.

Overview

The pipelines feature of Azure DevOps is a process engine that can be used for many things. The most common use of pipelines is to compile, test and deploy code. For the system to run these reusable things it needs to run them on another computer (usually a virtual machine). The Azure DevOps system is the coordinator of all of this work and it uses "agents" installed on other computers (VMs, Containers, or actual computers) to get the work done that it needs to get done.

Historically in Team Foundation Server there were different kinds of agents. One for builds/deploys, one for automated testing, and one for automated load testing. In Azure DevOps these are all the same agent running tasks for a pipeline. This is why we call these agents - universal.

Because Azure DevOps is a cloud based system it provides agents hosted by Microsoft. But you can also create your own agents and for many use cases this is preferred. These agents are put into pools that pipelines can use. The pools provide for scaling of parallel pipelines of execution. This document discusses when to use which types of agents, how to build them and to setup Azure DevOps for pools of these agents.

Agent Types

Microsoft defines two types of agents microsoft and self hosted agents. It also discusses options for self hosted agents using VMs, VM Scale Sets in Azure, Docker containers, and more. The following are the recommendations for agent types to use.

In order of priority to consider:

Microsoft Hosted - Windows, Linux and macOS agents.

A Microsoft hosted agent is a VM that has an inventory of software tools pre-installed on the machine image. When selected Microsoft dynamically creates a new VM with the image from a pool and then when your work is done deletes the VM. The following page lists the images available and which tools/tool versions are pre-installed on these agents.

The following reasons are why you wouldn't choose a Microsoft Hosted agent: - You need to access private resources in a cloud or on premise environment and no connector exists for private access. - You need more than 10GB of storage for source and build outputs. - Your pipeline tasks take more than 60 minutes to complete. - You need a faster or specialized build machine, Microsoft uses the general purposes Standard_DS2_v2 machines. - You need to send artifacts to a UNC file share. - You have specialized memory requirements for the agent.
Self-Hosted via Azure VM Scale Sets - Windows and Linux agents.

When you are self hosting an agent you can create it as a VM. We recommend doing that as an VM scale set based agent. The benefits of this are you can control the virtualized machine software and the scale set can be set to scale down to no instances removing costs. How this works you create a scale set specifying details about the VM including the machine size, image (which includes which software is installed) and other details like which virtual network it is a part. Then when you describe the scale set as a pool to Azure DevOps you specify some important configuration items. One really exciting feature of this is that the agent is automatically installed on the VM in the scale-set and connected with access to Azure DevOps when the scale set is exposed as a pool.
- Maximum number of agents in scale set. This is to manage costs and also give flexibility in periodic large scale needs for things like testing.
- Number of agents to keep on standby. If you set this to zero you will not pay for any costs when the agent pool is not in use. It will take up to 3 minutes to get the first agent in the pool going if it has been scaled to zero.
- Delay in deleting idle agents. This is how long the system will wait before deleting a scaled up agent. It allows for quicker reuse during busy times.
- Automatic tear down. This will delete the VM instance and recreate it after every job. It is a cleaner way to have an agent at a cost of slower reuse.
At this time there are no Microsoft Azure VM macOS agents that can be self-hosted. If you are compiling something that needs an operating system other than Windows or Linux your only options are to use the Microsoft Hosted macOS agent VM or add the agent ot an on-premise macOS based machine.

The following reasons are why you wouldn't choose a VM Scale Set Self-Host agent after evaluating Microsoft hosted agents: - You need a self-hosted agent that is running macOS. - You need access to an on-premise resource that is not exposed to the Internet. - You need access to physical hardware like a phone or Cardiovascular implantable electronic device (CIED). - Your test needs to use a non-emulated protocol like Bluetooth to access a real piece of hardware. - You are running in a different cloud (e.g. AWS) and need access for deployment or test execution (behind a firewall) without exposing that the the Internet.

Note: AWS and other cloud access is possible and preferred with VM Scale Set based agents if the environment is not totally locked down from the Internet. For example using AWS shell and IAM credentials it is possible to interact with AWS. As an example you can use the task 'AWS Shell Script' from the AWS toolkit for Microsoft Azure DevOps to deploy to a Kubernetes cluster in AWS.
Self-Hosted other VM/Hardware - Anything that can be accessed by a Windows, Linux or macOS based computer or VM.

If the Azure VM scale set and Microsoft hosted solution do not meet your needs you can install an agent on a VM or computer yourself and point it at the Azure DevOps APIs and join the agent to a pool. In this model you have some requirements.
- You need to be able to see the Azure Pipelines URL Medtronic Azure DevOps Org for Patient Management
- You need to be running on a supported operating system
  - Windows: Windows Agent setup and needs
  - Linux: Linux Agent setup and needs
  - Mac: Mac Agent setup and needs
- You need an account with a Personal Access Token that has Agent Pools (read, manage) and Deployment group (read, manage) access rights at minimum. This should be a service account. See the Patterns and Practice document on Credentialed access and security.

Agent Pools and their Relationship to Agents

Inside the Project Settings of a project under Pipelines is a section named Agent Pools. A built in pool named Azure Pipelines will be added to each project. Inside of that pool will be a Hosted Agent. In this section you can add additional pools. When you add an additional pool you have two options:

Azure Virtual Machine Scale Set
Self Hosted

Note this naming confuses some people because the Azure VM scale set's are also self-hosted, but they are self-hosted within an Azure subscription using a scale set. The difference is self hosted indicates you have installed the agent on something and will point it at the Azure DevOps pipeline pool. In the case of a scale set, Azure DevOps will manage the agent, connectivity and security.

As you build a pipeline you will specify which pool you want the pipeline to use. This is how you decide on where your pipeline runs its tasks. When writing a pipeline YAML you specify the pool to use with the following notation

Microsoft Hosted Pool - Using a predefined image name.

pool:
  vmImage: ubuntu-latest

Self Hosted Pool - Using the name you gave the pool when setup

pool: MyPoolName

We recommend if using Microsoft Hosted Pools to use the following images. This will change over time.

Pool Name	Image
windows-latest	Windows Server 2019 with Visual Studio 2019
ubuntu-latest	Ubuntu 18.04
macOS-latest	macOS X Catalina 10.15

List of all images for managed pools and their included software.

As a best practice use variables for the Microsoft hosted image name so it can be easily updated and changed.

Parallel Jobs

To make all of this work a pipeline selects an agent pool and then starts a job. Microsoft Azure DevOps uses these "parallel jobs" to figure out how to bill for execution of pipelines. This doesn't limit how many pipelines exist or are run, just how many we can run in parallel. If the job limit is met for that type of agent pool then the job request will queue. Since we use private projects in Azure DevOps we have the following limitations without purchasing more capacity.

1 free concurrent job with 1,800 minutes (30 hours) per month run rate, with a maximum job run rate of 60 minutes.
1 free concurrent job (on a Self-Hosted or VMSS hosted agent pool) with no limit of minutes.

Parallel jobs can be purchased at an organization level (and billed to the organization's associated Azure account) and be shared across all the team project's in the organization. The costs of these parallel jobs are as follows.

Microsoft Hosted Parallel Jobs - $40/month
Self-Hosted Parallel Jobs - $15/month

Other things contribute to the organizational cost of parallel jobs.

The first Microsoft hosted parallel job purchased doesn't allow you to run two jobs at the same time, it instead removes the time limits of 30 hours per month. If you need to run 2 jobs at a time you need to at minimum purchase two parallel microsoft hosted jobs.
You receive 1 free parallel self-hosted job with unlimited minutes.
Every additional Visual Studio Enterprise subscriber added to the organization contributes 1 additional free parallel job to the pool of available concurrent jobs.

Recommended Design of Parallel Jobs and Pools

The following are the goals this design optimizes for use: * Prefer managed and VM scale set pools over manual-self hosted pools when possible. * Create separate pools for system automation or load testing from the normal build, unit test, deploy pool to provide for capacity needs of large scale tests. * Allocated minimal self-hosted parallel jobs per project due to the number of Visual Studio Enterprise subscribers added to the org. If this changes, increase the self-hosted parallel jobs number per project. * Try to design pools where when the pool is not in use it isn't consuming VM hosting or storage costs.

The following are the recommendations for organizational pool design.

Allocate 2 Microsoft Hosted Parallel Jobs per Project added in the organization. So if we have 10 projects we would allocate 20 concurrent Microsoft Hosted Parallel jobs. This will be an organization administrators responsibility to do as a new project is created, which is a rare thing.
Allocate 1 additional Self-Hosted Parallel job per Project added in the organization. So if we have 10 projects we would allocate 10 additional Self-Hosted parallel jobs per project. This will be an organization administrators responsibility to do as a new project is created, which is a rare thing.
Each project by default gets the following pools setup.
Microsoft Managed Pool: This is added by default and can be used to build, test and deploy code on Windows, Linux or MacOS.
A Self-Hosted project specific VM Scale Set: This pool will by default be based on a custom image. Initially it will be either a Windows or Linux based image with no additional software. The project team along with the DevOps team will customize the agent image as needed for the project.

The VM Scale Set will be set to zero instances, max of 4, tear down time of 30 minutes, automatic tear down enabled. This can be customized for each team project but we will try to keep the minimum at zero for billing costs of the VMs.
- Optionally, A Self-Hosted project specific VM Scale Set for Automated Testing: This pool will by default be based on a custom image. Initially it will be a Windows based image with no additional software. The project team along with the DevOps team will customize the agent image as needed for the project.
The VM Scale Set will be set to zero instances, max of 16, tear down time of 30 minutes, automatic tear down enabled. This can be customized for each team project but we will try to keep the minimum at zero for billing costs of the VMs.
- Optionally, Self-Hosted Deployment Specific Pools as Needed: These pools are designed to manage agents that run on premise or in other clouds. This will be done with a manual agent installed pool. The agent can be on a machine in premise or in a VM in a cloud.

The purpose of these pools is the following:

Build, unit test, deploy and interact with Static Analysis as much as possible in Microsoft Managed agent jobs.
Provide a self-hosted capacity as needed as a scale set for teams with a dedicated pool for build, unit test, deploy and static analysis.
Optionally, provide a large capacity pool as a scale set for teams with a dedicated pool for automated system testing. Note that in some designs this will not be a Scale-Set pool because it needs to reside inside a closed external cloud and will need to be a normal self-hosted pool.
Optionally, provide for task specific pools for deployment into other clouds and on premise access.

As an example of pools, CareLink might need the following pools:

Default: * Default Pool - Microsoft Managed Windows and Linux Agents

Scale Set Based: * Build and Deploy Pool - VM Scale Set based Windows Agents with Custom image of tools

Test Specific Self-Hosted: * CareLink AWS Shared Dev Test Pool - A VM based pool where the agents are running in AWS Shared CareLink Dev for Automated Testing. * CareLink AWS Load Test Pool - A VM based pool where the agents are running in the AWS US load testing environment.

Deployment Specific Self-Hosted: * CareLink AWS Shared Dev Pool - A VM based pool where the agents are running in AWS Shared CareLink Dev for Deployment. * CareLink AWS US Prod Pool - A VM based pool where the agents are running in AWS US Production Deployment. * CareLink AWS EU Prod Pool - A VM based pool where the agents are running in AWS EU Production Deployment.

Since the CareLink team will be bringing into the organization more than 100 Visual Studio Licenses. They will not run into any parallel job limits on self-hosted pools. Their allocation of 2 managed pool agents can flex into the other ones of other projects as needed.

Pipeline capabilities and demands and their impact on agents and agent pools

When creating an agent pool for self-hosted agents you can specify user defined capabilities of the pool. This is so the jobs can be matched with an agent that meets the requirements of the job. This capability today is only available on agents that are in a self-hosted pool, VM scale set based pools do not support this capability since you can scale the pool to zero and all agents are identical.

In the agents tab of the pool there is a capabilities section where you can add key/value pairs of agent capability attributes and the value for this agent. Good examples of this type of system capability could be

Operating System Version
Various Tooling Versions
Various Runtime Versions
Indicators of Environment Access

Then when creating a pipeline and specifying a pool you can add demands on the pool. The following is an example:

pool:
  name: MyTeams-OnPremise-Pool-1
  demands:
  - Agent.OS -equals Linux  # Check to see if Agent.OS == Linux
  - VisualStudio2019        # Check to see if Visual Studio 2019 is installed
  - AWSDevConnection        # Check to see if environment can access AWS dev environments

The benefit of this design is you can create one self-hosted pool with different agent capabilities and then the pipeline will only execute jobs on agents that have matching capabilities. This dynamically reduces the number of self-hosted agent pools you will need to define. In the case of testing pools this has significant power in letting you run tests on different operating systems with different client demands, like browser versions in the same pool.

Scaling strategy for pools.

Scaling is controlled by two factors. First the number of parallel jobs available at any time for the type of pool and secondly, if a VM scale set or hand built pool the number of agents available. If we are running into parallel job agent limitations the team can request an increase in allocations from the organization administrator team. This should not happen with self-hosted parallel jobs due to the number of Visual Studio Enterprise subscribers.

If a VM scale set is used the scale set maximum size can be increased by the team. If a private pool is being used the team should add more VMs or hardware to the pool and the pool will upgrade automatically.

Pool based security

By default members of a team project will be granted access to a pool through their DL based groups. Also the project's service account groups will be given access to a pool. If a pool has access to restricted production resources it will be given to the project administrators and build administrators group for the project.

DL Based Group	Pool Type	Pool Based Role	Purpose
Project DL Members	Any	User	Use and access pools
Project DL Members	Restricted	Reader	View agent pools, not queue on them.
Project DL Administrators	Any	Administrator	Use, access, create and customize pools
Project Valid Users	Any	Reader	View agent pools, not queue on them.
Project Service Accounts	Any	User	Use and access pools

Pipeline approvals and checks for agent pools

An additional layer of security can be applied to the agent pools to allow you to manage how the resource is used. This is defined in the agent pools approvals and checks capabilities. Before the execution of a stage can begin, all checks on all the resources used in that stage must be satisfied. Azure Pipelines pauses the execution of a pipeline prior to each stage, and waits for all pending checks to be completed. Checks are re-evaluation based on the retry interval specified in each check. If all checks are not successful till the timeout specified, then that stage is not executed. If any of the checks terminally fails (for example, if you reject an approval on one of the resources), then that stage is not executed.

To enable checks select approvals and checks from the ellipse menu on the selected agent pool.

Checks can be configured on environments, service connections and agent pools.

Approval check

An approval check can be used to require one or more people to approve use of an agent pool before it can be used. You can also use approval checks to restrict the user or group who requested (initiated or created) the run from completing the approval. This is useful for separation of roles.

Branch control check

The branch control check allows you to verify that all resources in a pipeline come from allowed branches before the agent pool can be used. This is useful in helping control release readiness and the quality of deployments. In this check you specify a list of allowed git branches or branch masks using refs notation (main branch is ref/heads/main).

Business hours check

The business hours check allows you to have a pipeline run on the selected agent only during certain time frames. The run will be held until the start of the business hours check passes, unless the timeout period for the check has been exceeded. In the case that the timeout is exceeded the check will fail the run.

Other checks

You can use many other checks for programmatic control over when and if an agent pool can be used. These include:

Invoking an azure function as a check
Invoking a REST API as a check
Evaluating a container artifact is from a known whitelisted registry
Evaluating that no other run is using the agent pool at the same time with an exclusive lock check. This is useful for agent pools that use shared test resources that need to be reset after usage.

Method validation of tooling

To method validate the tooling we are taking a two-part approach:

First, the tools and their major versions are method validated for each tool we use. If the agent has more tools installed (like on a Microsoft Managed Agent) then are method validated but we do not use them then we do not care about validation as long as we can detect no interactions.

Second, the Simplified DevOps Architecture teams will maintain an updated list of which Microsoft Hosted agents are using which versions of method validated tooling. We will use template-based patterns to help teams stay within method validated versions in these Microsoft agents. Teams will be notified when this tooling list of versions is about to change and they need to react. This list of method validations serves as the main list of tools and versions that have been method validated by all teams.

Any team can method validate a tool and then use it on a self-hosted agent but if the self hosted agents are VM scale sets we have the following recommendation.

Create a custom image using Packer and provide an Azure DevOps repository and pipeline in the DevOps shared team project for that image.
Provide a component owner for the image that approves updates to the image's pipeline and repository. It is this component owner's responsibility that each tool in the image as referred to by the repository has a method validation for the major version on the tool used. The Simplified DevOps teams can provide examples and guidance on use of Packer to install products on images via a pipeline.
Mark the image with a tag of "production" or "gold" and use only that image in the scale set until tests have proven that the updated images are correctly updated, then move the tag to the newer version of the image and update the scale set definition. Consider this the promotion process to main. Automation of this promotion through a "deployment pipeline" of the Azure VM scale set is possible.

For any on premise or external VM or computer based self-hosted agent it is the responsibility of the team to manage the versions of software installed on these computers. The team can choose to document that they are using method validated versions from the main list maintained by the Simplified DevOps team.

Other good practices

Teams creating pipelines should be specific in the pipeline about the version of the tooling they want to use. They can do this with various tool install and tool version selection tasks within the pipeline. Never assume the pipeline is using the version of the tool you want to use. This is most important with .NET and .NET core versions on Microsoft Managed agents but is also important in many other areas. An example of this is the .NET core SDK version command:

steps: - task: UseDotNet@2 displayName: 'Use .NET Core SDK Version 3.1' inputs: packageType: sdk version: 3.1 installationPath: $(Agent.ToolsDirectory)/dotnet 2. Use templates for common practices so when the process or tooling version needs to change you don't need to update all pipelines by hand. Teams can also benefit from using the shared templates provided and co-team authored in the shared DevOps repositories. 3. Pools are not expensive to make or maintain but their agents can be expensive if self-hosted to run. Look for opportunities to use Microsoft hosted agents or VM scale sets that scale to zero instances to manage costs. 4. Pipelines can be broken up into stages and the stages can be run on different pools. This allows you to do common tasks on a Microsoft hosted agent and then do special tasks on self-hosted agents reducing the run time on expensive self-hosted agents. It also reduces the work needed to maintain agents. This can be done with stage specific syntax like the following in a pipeline.

``` stages: - stage: Build displayName: 'Compile, Unit Test, Static Analysis' pool: windows-latest # Microsoft Hosted Agent Pool
- stage: Signing displayName: 'Specialized Code Signing' pool: MyTeams-VMScaleSet-Pool-1 # A team specific VM Scale Set Agent Pool dependsOn:
- Build
- stage: Deployment displayName: 'Specialized Physical Deployment' pool: MyTeams-OnPremise-Pool-1 # A team specific on premise Self-Hosted Agent Pool dependsOn:
- Signing
- stage: SmokeTest displayName: 'End To End Smoke Tests of Deployed System' pool: MyTeams-VMScaleSet-AutomatedTest-Pool-1 # A team specific VM scale set agent pool for automation testing dependsOn:
- Deployment ```

References

Azure DevOps agents on docs.microsoft.com.