Machine learning used to feel like an exclusive club for those with advanced coding skills or access to massive computing power. Not anymore. BigQuery ML allows even non-programmers to create predictive models directly in SQL, the language of data analysts. That’s right: if you can write a SELECT statement, you’re ready to dive into machine learning.
In this post, I’ll show you how I used BigQuery ML to create a predictive model for bike trip durations using a public dataset. Whether you’re interested in machine learning or looking for a hands-on beginning, this case study will show you how to get started.
What Is BigQuery ML?
BigQuery ML (BQML) enables you to build, train, and test machine learning models using SQL within BigQuery. BQML simplifies everything into SQL queries, ending the need for data movement and advanced programming. Read more here.
For this case study, we’ll predict the duration of bike trips in San Francisco using the bigquery-public-data.san_francisco_bikeshare.bikeshare_trips dataset.
Case Study: Predicting Bike Trip Durations
We’ll build a model to predict bike trip durations based on available features like the starting station, ending station, and the type of subscriber.