This document details steps to set up new applications using Sprngy Admin UI, defining models, and running workloads.
Objective - Build a chart that describes Twitter user traffic in the USA
Accessing twitter data using the Twitter developer account to gather information on tweets sent as per the geo-location. This search can be filtered by a hashtag, date and time, or geographical area. Here, we will be analyzing the number of tweets sent across the USA. We will be using “rtweet“ library to access the data and SprngyBI to create analytics over it.
Based on the data technology used and the business use, the application is classified as below:
Quality of Data | Data Lake
| Data Lakehouse / Data Warehouse
| ||
---|---|---|---|---|
Curated
| ||||
Correlated
| ✔ | |||
Normalized
| ||||
Analyze
| ✔ | |||
Modelling
|
Now that the business use and classification of the application are established, the application can be created using the UI. In AdminUI, set up the application by going to the Set-up Application tab, select Create New, and filling out the file structure. Since we have just one layer file system we will have just one entity in it.
Since we are accessing data using the Tweeter API we can use Analytic Model directly to load, correlate and analyze the data.
A Sample Analytic Model is as follows:
Module Name | Processor Name | Pipeline Sequence | Variable Name | Assign Type | Parameter |
---|---|---|---|---|---|
Tweeter | tweeter_data | 1 | token | CREATE_VAR | rtweet::create_token(app = "BA_Demo123", consumer_key = 'pS10oUlc2s4Y70jX2XFXUGniL', consumer_secret = 'Ru2IVaCNyZsUM7UVHXZ0QFvma7BRrjL3J5xXvmeaqneOleT1q6', access_token = '1546529453131587584-No2NRwnvjWmXpsTDS71yjvTi23r6NQ', access_secret = '4lP01ELVNvqaF5kPMyJM80KwkHoFIxID0Az7jKEVZKuXw') |
Tweeter | tweeter_data | 2 | data | CREATE_VAR | rtweet::search_tweets("lang:en", geocode = lookup_coords("usa"), n = 10000, token = token) |
Tweeter | tweeter_data | 3 | data_lat_lng | CREATE_VAR | rtweet::lat_lng(data) |
Tweeter | tweeter_data | 4 | res | CREATE_VAR | data_lat_lng %>% dataops.overwritedata('/BigAnalytixsPlatform/Tweeter/tweeter_data/BAL','data') |
This would fetch the tweeter data, add latitude and longitude to it and save it to the given location.
In the utilityscripts folder, use the create_hive_ddl_using_spark_df.R script to generate the Hive SQL statement needed to create a data table in Hive. Once you run the script, a .hql file will be created; open this file and copy the statement that was generated. Run it in the hive terminal.
After adding the database and dataset (as shown in above screenshots), we can create charts for that dataset. To create charts go to the Chart page from the top menu, then pick the dataset that we just add and select the chart type. Here, we are selecting “deck.gl Scatterplot“ this requires latitude and longitude columns.