Context
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.
Danny’s Diner is in need of your assistance to help the restaurant stay afloat – the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business.
Problem Statement
Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.
He plans on using these insights to help him decide whether he should expand the existing customer loyalty program – additionally he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.
Questions
- What is the total amount each customer spent at the restaurant?
- How many days has each customer visited the restaurant?
- What was the first item from the menu purchased by each customer?
- What is the most purchased item on the menu and how many times was it purchased by all customers?
- Which item was the most popular for each customer?
- Which item was purchased first by the customer after they became a member?
- Which item was purchased just before the customer became a member?
- What is the total items and amount spent for each member before they became a member?
- If each $1 spent equates to 10 points and sushi has a 2x points multiplier – how many points would each customer have?
- In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi – how many points do customer A and B have at the end of January
Answers & Solutions
Answer:
Customer_id | total_spent_per_customer |
---|---|
A | 76 |
B | 74 |
C | 36 |
Solution: since we are trying to determine the total amount each customer spent, we should use the SUM aggregate function on the price column in the menu schema to add up the values representing how much each customer (customer_id) spent on purchased menu items. We use AS to assign the result of the clause to the alias total_spent_per_customer. The ORDER BY clause organizes the rows by total_spent_per customer values from greatest to least.
Answer:
customer_id | days_visited_restaurant |
B | 6 |
A | 4 |
C | 2 |
Solution: We select the customer_id column from the sales table and perform the aggregate COUNT function to add up the distinct occurrences of order_date values in the sales table. The DISTINCT clause will prevent us from counting duplicate values in the order_date column and is important because customers could potentially make multiple purchases within a given day. The result of this clause will be categorized under days_visited_restaurant using the AS function. We group by sales.customer_id and order by days_visited_restaurant from greatest to least.
Answer:
customer_id | product_name | product_id | order_date | purchase_order |
A | curry | 2 | 21-01-01 | 1 |
A | sushi | 1 | 21-01-01 | 1 |
B | curry | 2 | 21-01-01 | 1 |
C | ramen | 3 | 21-01-01 | 1 |
C | ramen | 3 | 21-01-01 | 1 |
Solution: We need to SELECT the columns containing customer id’s, product names, product_id, and order dates. The RANK clause allows us to create a new column and assign numbers to outputs based on order_date in ascending order (this is especially important since we want to return the row(s) with the earliest order_date). This ranking-output is given under the alias purchase_order by using the AS function. Since RANK is a window function, we need to designate our code as a Common Table Expression (CTE) so we can conduct further conduct additional functions. We wish to select all the rows from the sales_CTE common table expression where the value in the purchase_order column is equal to 1.
Answer:
product_name | times_purchased |
ramen | 8 |
Solution: We SELECT column product_name from the menu table and aggregate COUNT of the order_date column under the alias times_purchased using the AS clause. This works because we are counting the frequency in which the dates are assigned per product name. We ORDER BY the times_purchased alias in descending order, from greatest value to least. We use the limit clause to return only 1 row, in this case the highest value in the times_purchased column since we ordered by DESC.
customer_id | product_name | times_purchased |
C | ramen | 3 |
A | ramen | 3 |
B | ramen | 2 |
A | curry | 2 |
B | sushi | 2 |
B | curry | 2 |
A | sushi | 1 |
Answer:
Solution: Operating in the dannys_diner schema, we SELECT columns sales.customer_id, menu.product_name, and COUNT the frequency of values in the order_date column. We want to count the frequency of sales order date values per given customer because this will reveal how many times an item was purchased when grouped by product names. Thus, we can assign the alias times_purchased using AS. The sales table contains the customer_id and order_dates for customer orders, but we need to use an INNER JOIN to join the menu table and retrieve product names from the product_name column. We group by customer id’s and product name and then sort the results by the alias times_purchased in descending order
Solution: Operating in the dannys_diner schema, we need to SELECT columns customer_id, join_date, order_date, product_name, and product_id. These columns contain the customer id (which serves as a unique identifier), the join date for each customer, the date each product was purchased, name of each product purchased, and the product’s identifier, respectively. We use RANK to assign values based on the order_date column date in descending order (greatest to least). We assign the alias “rnk” to simplify. We join other tables sales and menu from the dannys_diner schema. The WHERE clause will allow us to direct our query to pull results where the date in which a product was purchased is less than (before) the date in which a customer became a member. Since we’ve used RANK in our statement earlier, which is a window function, we need to assign the query to a common table expression using WITH customer_cte AS (customer_cte is the name of the common table expression). This allows us to perforsm a subquery with customer_cte to pull rows where the rank (or “rnk”) is equal to 1. This is the first item purchased by customers before becoming a member. Notice how sushi and curry share a rank of 1 for customer A, this is because both products were purchased on the same day.
Answer:
customer_id | product_name |
A | sushi |
A | curry |
B | sushi |
C | N/A (never became a member) |