In this use case we want to extract data from our data lake:
Writing data from the data lake back into operational systems, e.g. an RDBMS.
First, we need to create the table in DBeaver for your user (hint: the hdm user already has this table set up, you can insert/delete with the hdm-user into this table). Please create it as follows such that we can let sqoop push data into it:
CREATE TABLE sales_statistics_region (
region VARCHAR(20),
average_sales FLOAT
);
Important: If you haven't executed the lab that writes the table sales_statistics_region to the Hive data warehouse, you can use the same command within beeline and insert sample data manually.
Please navigate to the cluster's master node and open an SSH connection:
This will again be our "entry point" to the sqoop tool (remember: we added it via the initialization action). With the following command, we can write the results back to the SQL server:
sqoop export –connect jdbc:mysql://<db-IP see Moodle>/<dbname, see Moodle> --username <see Moodle> --password <see Moodle> --table sales_statistics_region --hcatalog-table sales_statistics_region
The parameter "hcatalog-table" determines which table from the Hive data lake should be read. The parameter "table" directs towards the RDBMS's target table.
Please delete your cluster.
You now know how extract data from the data lake and integrate "traditional" RDBMS into our big data architecture.