Skip to main content
Version: 0.96

GCP Infrastructure Setup - Terraform (Recommended)

Overview

This document will guide you through creating and deploying a new Kubernetes cluster to your existing Google Cloud account, including the creation of all required resources. This process will be completed through the gcloud CLI, the Google Cloud web interface and will involve running Snorkel Flow's Terraform configuration.

Prerequisites and Google Cloud Platform Features:

Complete these prequisites for creating a new cluster in GCP, and enable any GCP features that Snorkel Flow requires to run.

To begin, you will need a few command line tools. Install the current versions of these tools if they are not already installed.

  1. Instructions to install Helm on your local machine:
  2. Instructions to install Terraform on your local machine:
  3. Create a new project on Google Cloud with associated billing account
  4. The user account that will perform the next steps will also require certain user permissions
  5. List of permissions that will most likely be encompassing, but could also contain some that we don't require.
  6. Create a DNS zone with an associated domain that you own, or if there is a pre-existing DNS zone then you can skip this step.
  7. Install the gcloud CLI and configure access to your project via gcloud init:
  8. Enable the required gcloud APIs

Cluster Deployment (Terraform)

Next, execute a Terraform configuration to create the cluster that Snorkel Flow runs in, alongside any required resources.

  1. Download and extract the terraform files, you should be in the directory containing the various .tf files and the variables.tf file.
  2. At this time, configure provider.tf inside of the services folder to point towards a Cloud Storage bucket of your choice - note that the name for the bucket has to be globally unique. If this GCS bucket is set up, then the state of the terraform installation will be stored inside this bucket, allowing it to persist.
  3. Inside variables.tf, note the required variables and fill them out in the file.
    • project: The name of the project on Google Cloud
    • region: The region the cluster will be located, ex: us-central1
    • zone: The specific zone the cluster will be located, ex: us-central1-c
    • cluster_name: Name of the created cluster
    • node_count: Number of compute nodes that will be created (default 4)
    • machine_type: Type of compute node that will be added (default e2-standard-32)
    • domain: The domain you own, in order to set up DNS, ex: snorkel.ai
    • subnet_id: ID of an existing subnet to deploy into, this can be empty (default null)
    • vpc_id: ID of an existing VPC to deploy into, this can be empty (default null)
  4. Note that upon creation of a project, GCP will automatically provision a default subnet unless the compute.skipDefaultNetworkCreation option is set. This will also provision a default subnet for each region that GCP offers. If no subnet_id or vpc_id is provided to the terraform files, then the default will be used.
  5. Once you are happy with the inputted variables, we can initialize Terraform with
    •  terraform init
  6. See what the planned resources to be created are with
    • terraform plan
  7. If everything looks as expected, we can continue to apply the configuration with
    • terraform apply
  8. Wait for the cluster to spin up, this can take around 10 minutes. The status of the cluster spin-up can be monitored in the Google Cloud web interface.