What is Google Cloud ML Engine?
Narin LuangrathNarin Luangrath
The cloud and machine learning: two phrases with a lot of hype that few people understand. We're intimately familiar with both here at Leverege, so hopefully this article will shed some light on the two topics.
Before we share what we've learned using Google Cloud ML Engine, we need to do a quick refresher on how machine learning is done in production. There are roughly 4 steps:
[caption id="attachment_7730" align="aligncenter" width="411"]
Google Cloud ML[/caption]With Cloud ML engine, you can train your ML model in the cloud using Googleâs distributed network of computers. Instead of just using your laptop to train your model, Google will run your training algorithm on multiple computers to speed up the process. Furthermore, you can configure the types of CPUs/GPUs these computers run on. There are some algorithms that run a lot faster if you use GPUs instead of CPUs.
[caption id="attachment_7731" align="aligncenter" width="764"]
A Google data center.[/caption]Another benefit we've found of training with Cloud ML Engine is that you donât have to worry about storing the training data. If you have a million emails to train your spam filter, how are you going to get them on your laptop to train your model? When you train your model using Cloud ML, you can easily store your training data online in a Google Cloud Storage âBucketâ.
[caption id="attachment_7732" align="aligncenter" width="682"]
TensorFlow: Googleâs open-source ML Framework[/caption]The steps involved with building a machine learning model in TensorFlow and packaging your code so that Cloud ML Engine can process it are a bit complicated and beyond the scope of this high level overview. You can start learning about the details here.
However, when youâve built the model and are ready to train, submitting the training job to Google Cloud ML Engine is just a quick shell script command:
gcloud ml-engine jobs submit training $JOB_NAME \ Â Â Â Â --job-dir $OUTPUT_PATH </span> Â Â Â Â Â --runtime-version 1.2 </span> Â Â Â --module-name trainer.task </span> Â Â Â --package-path trainer/ </span> Â Â Â --region $REGION </span> Â Â Â --scale-tier STANDARD_1 </span> Â Â Â -- </span> Â Â Â --train-files $TRAIN_DATA </span> Â Â Â --eval-files $EVAL_DATA </span> Â Â Â --train-steps 1000 </span> Â Â Â --verbosity DEBUG Â </span> Â Â Â --eval-steps 100
The important part to understand is that trainer.task is the file that contains your TensorFlow application and STANDARD_1 specifies that you want to use multiple computers (distributed computing) to train your model. Â Â
One thing you could do is build a backend web server, like www.my-spam-filtering-api.com (not a real website) that takes in emails as POST requests and responds with the ML algorithmâs guess about whether or not the email is spam. But what if your backend web server gets tens of thousand of requests? Will it be able to handle the load or will it break?
Google Cloud ML let's you avoid the complexity of building a scalable machine learning web server by doing it for you. If you put your machine learning algorithm on Google Cloud ML, it can handle predictions for you. Right now, it supports âonline predictionsâ and âbatch predictionsâ.
Use online predictions if you have a couple (i.e. ten) data points that you want to run through your algorithm. For larger datasets (i.e. thousands of points) use batch predictions. Â Â Â Â Â Â Â Â Â Â Â
[caption id="attachment_7733" align="aligncenter" width="690"]
Run your ML applications on Googleâs infrastructure.[/caption]New Podcast Episode
Recent Articles