Accessing Object Storage in GCP

Introduction

This tutorial demonstrates how to access your Google Cloud Storage from Neuro Platform. You will create a new Neuro project, a new project in GCP, a service account, and a bucket and make that bucket is accessible from Neuro Platform job.

Make sure you have Neu.ro CLI installed.

Creating Neuro and GCP Projects

To create a new Neuro project, run:

neuro project init
cd <project-slug>
make setup

It's a good practice to limit the scope of access to a specific GCP project. To create a new GCP Project, run:

PROJECT_ID=${PWD##*/} # name of the current directory
gcloud projects create $PROJECT_ID
gcloud config set project $PROJECT_ID

Make sure to set billing account for your GCP project. See Creating and Managing Projects for details.

Creating a Service Account and Uploading an Account Key

First, create a service account the job will impersonate into:

SA_NAME="neuro-job"
gcloud iam service-accounts create $SA_NAME \
--description "Neuro Platform Job Service Account" \
--display-name "Neuro Platform Job"

See Creating and managing service accounts for details.

Then download account key:

gcloud iam service-accounts keys create ~/$SA_NAME-key.json \
--iam-account $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com

Check that the newly created key is located at ~/.

Create a new secret for the file:

neuro secret add gcp-key @~/$SA_NAME-key.json

Open Makefile and find the following line in it:

SECRETS?=

Replace the line with that one:

SECRETS?="-v secret:gcp-key:/var/secrets/gcp.json -e GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/gcp.json"

Creating a Bucket and Granting Access

Now, create a new bucket. Remember: bucket names are globally unique (see more information on bucket naming conventions).

BUCKET_NAME="my-neuro-bucket-42"
gsutil mb gs://$BUCKET_NAME/

Grant access to the bucket:

# Permissions for gsutil:
PERM="storage.objectAdmin"
gsutil iam ch serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com:roles/$PERM gs://$BUCKET_NAME
# Permissions for client APIs:
PERM="storage.legacyBucketOwner"
gsutil iam ch serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com:roles/$PERM gs://$BUCKET_NAME

Testing

Create a file and upload it into Google Cloud Storage Bucket:

echo "Hello World" | gsutil cp - gs://$BUCKET_NAME/hello.txt

Run a development job and connect to the job's shell:

export PRESET=cpu-small # to avoid consuming GPU for this test
make develop
make connect-develop

In your job's shell, try to use gsutil to access your bucket:

gsutil cat gs://my-neuro-bucket-42/hello.txt

Please note that in develop, train, and jupyter jobs the environment variable GOOGLE_APPLICATION_CREDENTIALS points to your key file. So you can use it to authenticate other libraries.

For instance, you can access your bucket via Python API provided by package google-cloud-storage:

>>> from google.cloud import storage
>>> bucket = storage.Client().get_bucket("my-neuro-bucket-42")
>>> text = bucket.get_blob("hello.txt").download_as_string()
>>> print(text)
b'Hello World\n'

To close remote terminal session, press ^D or type exit.

Please don't forget to terminate your job when you don't need it anymore:

make kill-develop