This is our use case: a simple webshop for which we want to provide the management a sales report per region.
Loading example source data into a relational database (in the cloud) and aggregating these visually in a dashboard.
Navigate to Compute Engine (GCP's product for managing virtual machines). You should be able to see the running "jump host" that allows us to connect to a MySQL database running on another virtual machine (in the bwcloud).
Please click on SSH to connect to a shell of this machine.
First, we need to connect to the database. A port forwarding tunnl is already set up to the remote machine. Please enter the following command and the password which is provided in Moodle afterwards:
mysql -h localhost -P 3306 -u hdm -p
You should see this prompt:
The following commands require some basic SQL understanding which is required for this course.
Switch to the webshop database by entering the following command into the shell:
use webshop;
Select data from the sales table as follows:
select * from sales;
Navigate to "Vertex AI" - "Colab Enterprise":
Create a notebook (call it your last name):
Connect your notebook to a runtime environment:
You should be able to select an existing runtime.
Install pymysql using this command in the first cell:
!pip install pymysql
This should be the output after running the cell:
You should be able to load the sales table's content into a DataFrame:
import pandas as pd
from sqlalchemy import create_engine
DB_USER = 'siehe Moodle'
DB_PASSWORD = 'siehe Moodle'
DB_HOST = 'siehe Moodle' # interne IP der MySQL-VM
DB_PORT = 3306
DB_NAME = 'siehe Moodle'
# SQLAlchemy Connection-URL
connection_url = f"mysql+pymysql://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"
engine = create_engine(connection_url)
# Select statement
df = pd.read_sql("SELECT * FROM sales", con=engine)
df
You should see this result (first 10 rows only shown):
Your task is to generate a simple overview showing the total sales value per Region and Product.
Please install plotly express by executing
!pip install plotly-express
We can create a very simple visualization by using a treemap:
import plotly.express as px
px.treemap(df, path=['region', 'product_name'], values='sales_value')
Output:
Congratulations, you succeeded in the first lab of our big data journey. This lab was not really touching big data systems but was intended to deepen your cloud knowledge, take your fears of opening and using a command shell (which is very simple in GCP), and connecting to sample source data.