Skip to main content
Version: 0.94

Azure Infrastructure Setup - Manual

Overview

This document covers the steps required to deploy a new Kubernetes cluster to your existing Azure account, including the creation of all required resources. This process will be completed through the Azure CLI as well as the web interface.

Prerequisites and Azure Features

In this step, we will go over the prerequisites for creating a new cluster in Azure, and enable any Azure features that Snorkel Flow requires to run.

To begin, you will need a few command line tools. Install the current versions of these tools if they are not already installed.

  1. Instructions to install az

  2. Instructions to install helm

  3. Instructions to install kubectl

  4. Run

    az login

    to login with an Azure account or service principle with appropriate admin permissions (ensure to specify the --tenant flag)

  5. We require Azure features that are currently only available in Preview. Ensure they are activated and registered for your account, you can register for the required Azure Preview features with the following commands.

    • Azure Preview
      • az extension add --name aks-preview
      • az extension update --name aks-preview
    • Azure Files NFS mounting in AKS
      • az feature register --name AllowNfsFileShares --namespace Microsoft.Storage
      • az provider register --namespace Microsoft.Storage
      • Wait around 15 minutes, and ensure

        az feature show --name AllowNfsFileShares --namespace Microsoft.Storage --query properties.state

        outputs ”Registered”

Manually provision required infrastructure

In this step, we will be creating the cluster that Snorkel Flow runs in, alongside any required resources.

  1. Create a new resource group where your account / service principal you previously logged in with is an “Owner” of the resource group)

  2. Make sure that an Azure AD group exists whose members will have admin access to the newly created cluster (where your account / service principal is an owner and/or member)

  3. Create a new virtual network with a subnet (suggest at least /18 address space) in the resource group created above.

  4. After creating the subnet, go to the Azure UI, find the subnet under the Virtual Networks page, and ensure Microsoft.Storage is checked under Service Endpoints in the subnet configuration.

  5. Select both options under Network policy for private endpoints (Network security groups and Route tables).

  6. You can use your existing Azure DNS zone, or create a new one for Snorkel Flow.

    1. Azure DNS with a delegated domain -> https://learn.microsoft.com/en-us/azure/dns/dns-delegate-domain-azure-dns
  7. [OPTIONAL] If you would like to configure TLS, then create a new Azure Key Vault

    1. az keyvault create -g snorkel-flow-rg -l <Location> -n <KeyVaultName> --enable-rbac-authorization
    2. Generate and/or import certificates (https://medium.com/@jibinpb/lets-encrypt-certificate-with-azure-dns-b9ed32ae5aee)
  8. Gather required variables

    1. snorkel_rg_name
      1. from step 4a
    2. vm_size_node
      1. we recommend D32ds_v5
    3. admin_group_name
      1. from step 4b
    4. vnet_name and subnet_name
      1. from step 4c
  9. Go to the Azure web portal, and click through to provision the infrastructure (replace the variables in {{ }} with their appropriate values from 5)

    1. Cluster
      1. Navigate to the Kubernetes services page from the top-level search and click Create then Create a Kubernetes cluster
      2. Select the resource group as {{ snorkel_rg_name }}
      3. For Kubernetes cluster name, put snorkel-flow-cluster
      4. For Primary node pool, set Scale method to Manual and Node count to 1
      5. Click Next: Node pools
      6. Under Node pools Click Add node pool
      7. For Node pool name, put cpupool
      8. Check all availability zones boxes available
      9. For Node size, search for {{ vm_size_node }} , click the option displayed, and click Select
      10. Set Scale method to Manual and Node count to 2
      11. Click Next: Access
      12. For Authentication and Authorization, select Azure AD authentication with Azure RBAC
      13. Click Next: Networking
      14. For Network configuration, select Azure CNI
      15. Select {{ vnet_name }} and {{ subnet_name }}
      16. Set DNS name prefix to be snorkel-flow-cluster
      17. Click Next: Integrations
      18. Click Next: Advanced
      19. Set Infrastructure resource group to be snorkel-flow-cluster-nodes
      20. Click Review + create
      21. Click Create to finish
    2. Storage Account
      1. Navigate to the Storage accounts page from the top-level search and click Create
      2. Select the resource group as what was defined in the Infrastructure resource group from the cluster creation step
      3. Type in a unique storage account name (can only have lowercase letters and numbers)
      4. Select the region to be the same as the {{ snorkel_rg_name }} resource group
      5. Select the performance to be Premium
      6. Select the premium account type to be File shares
      7. Select the redundancy to be ZRS
      8. Click Next: Advanced
      9. Uncheck Require secure transfer for REST API operations
      10. Click Next Networking
      11. Under Network access, check Enabled from selected virtual networks and IP addresses
      12. Select the virtual network name and subnet name
      13. Click Review
      14. Click Create to finish
    3. Cluster Role Assignment
      1. Go to the overview page of the newly created cluster, and click Access control (IAM).
      2. Click Add, then click Add role assignment
      3. Search for Azure Kubernetes Service Cluster Admin Role, and select it then click next
      4. Click Select members, then search by {{ admin_group_name }} and select it then click Next
      5. Click Review and assign to finish
    4. Virtual Network Role Assignment
      1. Go to the overview page of the manually created virtual network from step 4, and click Access control (IAM).
      2. Click Add, then click Add role assignment
      3. Search for Network Contributor, and select it then click next
      4. Check Managed identity next to Assign access to
      5. Click Select members, select Kubernetes service under Managed identity, then select the snorkel-flow-cluster (ensure it is the right cluster in the previously created resource group), and click Select
      6. Click Review and assign to finish
  10. Set up kubectl access

  11. az aks get-credentials --resource-group {{ snorkel_rg_name }} --name snorkel-flow-cluster --admin

If this command doesn't work, you should go to the cluster overview page, go to the cluster configuration tab on the left hand side, and ensure the Kubernetes local accounts checkbox is checked.

  1. Complete cluster setup by creating and applying the following yaml files to the cluster.
    Replace the variables in {{ }} with their appropriate values - the storage account name can be seen from the Storage Account Overview tab, and the storage account access key can be seen from the Storage Account Access keys tab.
  2. Create and apply namespace.yaml to create the namespace for Snorkel Flow
    1. kubectl apply -f namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
 annotations:
   meta.helm.sh/release-name: snorkel-flow
   meta.helm.sh/release-namespace: snorkel-flow
 labels:
   app.kubernetes.io/managed-by: Helm
  name: snorkel-flow
  1. Create and apply storageaccountsecret.yaml
    1. kubectl apply -f storageaccountsecret.yaml
apiVersion: v1
stringData:
 azurestorageaccountkey: {{ STORAGE_ACCOUNT_KEY }}
 azurestorageaccountname: {{ STORAGE_ACCOUNT_NAME }}
kind: Secret
metadata:
 annotations:
   meta.helm.sh/release-name: snorkel-flow
   meta.helm.sh/release-namespace: snorkel-flow
 labels:
   app.kubernetes.io/managed-by: Helm
 name: secret-storage-account
 namespace: snorkel-flow
type: Opaque
  1. Create and apply storageclass.yaml
    1. kubectl apply -f storageclass.yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
 name: snorkel-flow-sc
parameters:
 protocol: nfs
 secretName: secret-storage-account
 secretNamespace: snorkel-flow
 storageAccount: {{ STORAGE_ACCOUNT_NAME }}
provisioner: file.csi.azure.com
reclaimPolicy: Retain
volumeBindingMode: Immediate
  1. Enable the ingress controller add on (https://learn.microsoft.com/en-us/azure/aks/web-app-routing?tabs=without-osm)

  2. az aks enable-addons -g snorkel-flow-rg -n snorkel-flow-cluster --addons azure-keyvault-secrets-provider,web_application_routing --enable-secret-rotation
  3. get the object (principal) ID of the managed identity of the web-app-routing add on from the Azure UI and save it under the name MANAGEDIDENTITY_OBJECTID

    1. From the Azure console, go to Resource Groups, select the resource group created by the Snorkel Flow installation process (not the one defined in variables.tf, but rather the other automatically created one that ends in “-nodes”, for example “snorkel-flow-rg-nodes”), and search for the appropriate managed identity (should start with “webapprouting-”, e.g. “webapprouting-snorkel-flow-cluster”).
  4. Next, in order for automatic management of DNS records we will connect the ingress controller add-on to Azure DNS. Get the resource ID of the Azure DNS zone you created in step 4 from the Azure UI (select the DNS Zone, click Properties) and save it under the name ZONEID

    • az role assignment create --role "DNS Zone Contributor" --assignee $MANAGEDIDENTITY_OBJECTID --scope $ZONEID
    • az aks addon update -g snorkel-flow-rg -n snorkel-flow-cluster --addon web_application_routing --dns-zone-resource-id=$ZONEID
  5. Connect the ingress controller add-on to Azure Key Vault (for automatic management of TLS certs)

    • KEYVAULTID=$(az keyvault show --name <KeyVaultName> --query "id" --output tsv)
    • az role assignment create --role "Key Vault Secrets User" --assignee $MANAGEDIDENTITY_OBJECTID --scope $KEYVAULTID