Skip to main content
Version: 0.94

GCP Infrastructure Setup - Manual

Overview

This document will guide you through creating and deploying a new Kubernetes cluster to your existing Google Cloud account, including the creation of all required resources. This process will be completed through the gcloud CLI and the GCP web interface.

Pre-Requisites and Google Cloud Platform Features:

In this step, we will go over the pre-requisites for creating a new cluster in GCP, and enable any GCP features that Snorkel Flow requires to run.

To begin, you will need a few command line tools. Install the current versions of these tools if they are not already installed.

  1. Instructions to install Helm on your local machine:
  2. Instructions to install Terraform on your local machine:
  3. Create a new project on Google Cloud with associated billing account
  4. The user account that will perform the next steps will also require certain user permissions
  5. List of permissions that will most likely be encompassing, but could also contain some that we don’t require.
  6. Create a DNS zone with an associated domain that you own, or if there is a pre-existing DNS zone then you can skip this step.
  7. Install the gcloud CLI and configure access to your project via gcloud init:
  8. Enable the required gcloud APIs
gcloud services enable dns.googleapis.com --project= snorkel-ai-gcp-standardization
gcloud services enable compute.googleapis.com --project=snorkel-ai-gcp-standardization
gcloud services enable container.googleapis.com --project=snorkel-ai-gcp-standardization
gcloud services enable file.googleapis.com --project=snorkel-ai-gcp-standardization

Cluster Deployment

In this step, we will be creating the cluster that runs Snorkel Flow itself, as well as any required resources for the cluster.

  1. Create a service account for the cluster
    • gcloud iam service-accounts create gke-cluster-admin-account --display-name "GKE Cluster Admin Account" --project=snorkel-ai-gcp-standardization
  2. Get the email output from
    • gcloud iam service-accounts describe gke-cluster-admin-account@snorkel-ai-gcp-standardization.iam.gserviceaccount.com
  3. Grant the roles to the email found in the output, here it is listed as gke-cluster-admin-account@snorkel-ai-gcp-standardization.iam.gserviceaccount.com but you will be using the output from the previous step. This will allow the cluster service account to manage resources required during the Snorkel Flow deploy.
    • gcloud projects add-iam-policy-binding snorkel-ai-gcp-standardization --member=serviceAccount:gke-cluster-admin-account@snorkel-ai-gcp-standardization.iam.gserviceaccount.com --role=roles/container.admin
    • gcloud projects add-iam-policy-binding snorkel-ai-gcp-standardization --member=serviceAccount:gke-cluster-admin-account@snorkel-ai-gcp-standardization.iam.gserviceaccount.com --role=roles/compute.instanceAdmin.v1
    • gcloud projects add-iam-policy-binding snorkel-ai-gcp-standardization --member=serviceAccount:gke-cluster-admin-account@snorkel-ai-gcp-standardization.iam.gserviceaccount.com --role=roles/dns.admin
    • gcloud projects add-iam-policy-binding snorkel-ai-gcp-standardization --member=serviceAccount:gke-cluster-admin-account@snorkel-ai-gcp-standardization.iam.gserviceaccount.com --role=roles/file.editor
  4. Create the cluster using the gcloud CLI
    • gcloud container clusters create snorkel-flow-gcp \
      --zone=us-central1 \
      --num-nodes=4 \
      --network=[NETWORK] \ # VPC ID if needed
      --subnetwork=[SUBNETWORK] \ # Subnet ID if needed
      --addons=GcpFilestoreCsiDriver,HttpLoadBalancing \
      --enable-ip-alias \
      --service-account=gke-cluster-admin-account@snorkel-ai-gcp-standardization.iam.gserviceaccount.com \ # From step B
      --machine-type="e2-standard-32" \
      --scopes=https://www.googleapis.com/auth/cloud-platform \
      --enable-autoupgrade \
      --enable-autorepair \
      --project=snorkel-ai-gcp-standardization
  5. Once the cluster is up and running, install the helm chart for ExternalDNS
    • helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/
    • helm repo update
    • helm install external-dns external-dns/external-dns \
      --namespace kube-system \
      --set provider=google \
      --set google.project=snorkel-ai-gcp-standardization \
      --set "domainFilters[0]"={DOMAIN YOU OWN} \
      --set "sources[0]"=ingress \
      --set "sources[1]"=service
      --version 
  6. Set up kubectl access to the newly created cluster: (https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl)
    • gcloud components install kubectl
    • gcloud container clusters get-credentials CLUSTER_NAME --region=COMPUTE_REGION