In this use case we want to store the contents of our webshop table in the data lake (nightly full extract):
Writing data from the data lake back into operational systems, e.g. an RDBMS.
Please set up the data fusion instance in analogy to the lab Data Fusion (https://pkuep.github.io/pk-bigdata/batch_ingestion_datafusion).
Please name the pipeline "export_sales_statistics".
Please navigate to the folder "webshop_datalake" and select the csv-file "webshop_history":
We now want to parse this csv-file. First, we need to do the actual parsing:
And second, we need to specify the sales_value column to be of numeric type. Please select the properties of the "Wrangler" stage and set the type of the column "sales_value" to "double":
Please create a pipeline afterwards:
Hit "Get Schema" once such that your screen looks like this:
Add the grouping logic as shown in the screenshot.
Next, we want to provide the connection to our cloud SQL server as a reference for an external RDBMS. Add a respective "Database" sink:
The properties should be set as follows:
This is how the screen should look like:
Please save and then export your pipeline for later use. The pipeline should look similar to this:
Please deploy your pipeline:
Afterwards, you should hit "Run" to execute it. This will take ~5 min.
When the pipeline run is finished, you should see your successful run:
When switching back to the SQL connection to our RDBMS, we should see that our batch processing results have been correctly calculated and written back to the RDBMS:
Please delete your data fusion instance!
You now know how extract data from the data lake and integrate "traditional" RDBMS into our big data architecture. Furthermore, you are able to perform batch processing in another tool (Data Fusion).