Sprngy - Creating New Applications (Continued...)
This document details steps to setup new applications using Sprngy Admin UI, defining models and running workloads.
Use Case Category: Finance
Demo Application 1: Analyzing Inflation Dataset
Objective
Determining how one can live an inflation proof lifestyle by analyzing the percentage change in the consumer price index.
To read data file from google drive and loading the data directly to BDL fact using analytical model.
Overview
Inflation dataset has one entity i.e. inflation and the dataset exhibits the percentage change value (every year) in consumer price index from the year 1947 to 2022.
Â
This dataset can be hosted in a google Drive. To setup google Drive connector, please contact Sprngy Support (support@sprngy.com)
Classifying the Application:
Based on the data technology used and the business use, the application is classified as below:
Quality of Data | Data Lake Data Lake is a centralized repository to store large amount of raw data | Data Lakehouse / Data Warehouse |
---|---|---|
Curated | ✔ |  |
Correlated | ✔ |   |
Normalized | ✔ |  |
Analyze | ✔ |  |
Modelling | Â | Â Â |
Â
Step 1: Setting up the application
Now that the business use and classification of the application is established, the application can be created using the UI. In AdminUI, set up the application by going to the Application Configuration screen, select Create New, and filling out the file structure. Since we have just one layer data we will have just one entity in it.
Step 2: Creating Meta Model
We can now set up the Meta Model in AdminUI:
Add the column names and data types from the dataset into the Create Meta Model page and then click submit. Note that you do not add the as_of_date column, as that will be added automatically.
Step 3: Creating Analytical Model
The data flow from the source i.e. google drive to directly to BDL is achieved using analytical model. Pre-processing of data is also done using analytical model.
Step 4: Running the Analytical Workloads
Running the analytical workload for INFLATION application will load the data directly from your local to the BDL fact datalake.
Step 6: Importing Database and Dataset into SprngyBI and creating a dashboard for the charts
Once you are in SprngyBI, select the Datasets option from the Data dropdown in the top menu. From there, select the add Dataset option. Set the Database to Apache Hive, select your database from Schema, and select which table you would like to add. SprngyBI will only allow you to add one table at a time, but you can add as many tables as you want one by one.
(see this page for further reference)
Â
A Datetime column is supposed to be added into csv dataset file to get the time series visualization for SprngyBI. Here, for Inflation application we have created "year_new" column which is the copy of "year1" column, but have just added "yyyy-01-01" to get datetime column. And then run the analytical workloads again.
Initially while adding that column it's datatype can be String and then later on in SprngyBI, click on edit symbol beside your dataset name and then under CALCULATED COLUMNS you need to enter the SQL Query "from_unixtime(unix_timestamp(year_new, 'yyyy-MM-dd'))" and select the datatype as DATETIME, click on Save. To plot the time series graph it is necessary to have a Datetime column.
Â
Â
Demo Application: Predicting Wealth Inequalities between Black, White and Hispanic groups.
Objective
Predicting Wealth Inequalities between Black, White and Hispanic group for the upcoming years, using analytical model we are training the machine learning model on the given dataset and dataset has target column as mean_net_worth.
Overview
Wealth Inequalities dataset comprises of every 3 years of the data from the year 1989 to 2019, representing the year, race, mean and median of income, savings, debts, investments and net worth.
Classifying the Application:
Based on the data technology used and the business use, the application is classified as below:
Quality of Data | Data Lake | Data Lakehouse / Data Warehouse |
---|---|---|
Curated |  |  ✔ |
Correlated |  |   ✔ |
Normalized |  |  ✔ |
Analyze |  |  ✔ |
Modelling |  |  ✔ |
Â
Step 1: Setting up the application
Now that the business use and classification of the application is established, the application can be created using the UI. In AdminUI, set up the application by going to the Application Configuration tab, select Create New, and filling out the file structure. Since we have just one layer data, we will have just one entity in it.
Step 2: Creating Meta Model
We can now set up the Meta Model in AdminUI:
Add the column names and data types from the teams dataset into the Create Meta Model page and then click submit. Note that you do not add the as_of_date column, as that will be added automatically.
Step 3: Creating Ingest Model
Next, create the ingest model in AdminUI.
Â
Step 4: Running Workloads
Once we submit the Ingest Model, we can run the workloads under the Batch Management/Run Workloads page.
First, run the SDL-FDL workload. This will apply the processors we selected in the Ingest Model for the SDL to FDL layer. You can see if the workload ran correctly by going to the ‘Workload Management’ page.
Once you confirm that the SDL-FDL workload ran correctly, run the FDL-BDL workload next. This will apply the transformations selected in the Ingest Model for the FDL to BDL layer. You can see if the workload ran correctly by going to the ‘Workload Management’ page.
Â
Step 5: Creating Analytical Model
The data is taken from the BDL fact datalake and is used for training the ML model onto it and making predictions on the target column using analytical model.
Â
Step 5: Running the Analytical Workloads
Running the analytical workload for INFLATION application will load the data directly from your local to the BDL fact datalake.
 Step 6: Importing Database and Dataset into SprngyBI and creating a dashboard for the charts
Once you are in SprngyBI, select the Datasets option from the Data dropdown in the top menu. From there, select the add Dataset option. Set the Database to Apache Hive, select your database from Schema, and select which table you would like to add. SprngyBI will only allow you to add one table at a time, but you can add as many tables as you want one by one.
(see this page for further reference)
Â
A Datetime column is supposed to be added into csv dataset file to get the time series visualization for SprngyBI. Here, for Wealth Inequalities application we have created "year_new" column which is the copy of "year1" column, but have just added "yyyy-01-01" to get datetime column. And then run the analytical workloads again.
Initially while adding that column it's datatype can be String and then later on in SprngyBI, click on edit symbol beside your dataset name and then under CALCULATED COLUMNS you need to enter the SQL Query "from_unixtime(unix_timestamp(year_new, 'yyyy-MM-dd'))" and select the datatype as DATETIME, click on Save. To plot the time series graph it is necessary to have a Datetime column.
Â
Â
Copyright © Springy Corporation. All rights reserved. Not to be reproduced or distributed without express written consent.