Case Study

Three Recommendations for Cyclistic to Grow Its Members
September 2021

Skills/Tools: R (tidyverse, lubridate, hms, geosphere, janitor, ggplot2, ggpubr, hexbin, viridisLite)

This case study is part of the Google Data Analytics Capstone. This work aims to give Cyclistic, a fictional bike-sharing company, recommendations to encourage their casual riders to buy an annual membership.

The Problem

Cyclistics has two groups of customers: casual riders (single-ride and full-day pass customers) and members (annually-paid members). Since the annual members brought more profit, the company wants to convert casual riders into annual members. But, first, the marketing analyst team needs to comprehend the behavior of both casual riders and annual members on riding Cyclistic bikes.

Skip to the recommendations >>

The Process

I analyzed 12 months of customer behavior data from September 2020 to August 2021. The data contain customers’ rideable type, bike time, start and end location (station information, latitude, and longitude), and the customers’ group: casual riders or members.

Before doing any analysis, I performed data cleaning and transformation. Below is the data cleaning process.

  • Remove any duplicate
  • Confirm the years and months match the filename
  • Confirm all station id types are character
  • Change all date-time from character to date/time class
  • Filter out rows whose starting time is later than the ending time
  • Confirm rideable type has a reasonable number of unique values, and those values portray the valid rideable type
  • Confirm that there are only two customer groups: member and casual
  • Filter out missing values, both NA and empty character
  • Complete missing station id or station name

Data transformation:

  • Create new columns for the bike trip duration, the day trip started, the day trip ended, the hour trip started, and the hour trip ended
  • Create a new column to group the bike trip duration every 10 minutes
  • Create a new column to flag whether the trip starts and ends at the same station
  • Create a new column for Haversine distance between the starting point (start_lat, start_lng) and ending point (end_lat, end_lng)
  • Create a new column for the ratio between the Haversine distance and ride length
  • Create two copies of datasets, exclusive for each group of casual riders and members

Next, I started analyzing by comparing the behaviors of 2 customer groups: what bike they used, when, how long, and how fast they biked.


Based on the differences between casual riders’ and members’ bike trips details, the recommendations to encourage more casual riders to purchase annual membership are:

1. Provide more Cyclitics bike-stand close to the office district.

Members were regularly riding Cyclistic bikes near before and after working hours. We assume they rode the bike to/from the office.

The Difference of Members and Casual Riders Based on The Time Their Trips Started and Ended is in the images below.

2. Decrease the distance between Cyclistic bike-stand to easily return the bike after a short ride.

Based on the customers’ ride length, members usually do a short ride.

3. Increase the availability of fast-riding bikes, particularly fast-riding classic bikes.

Members rode the bikes faster than casual riders and favored classic bikes compared to other rideable types.