Import Model FAQs

What is an Import Model?

Import routines (including connection details) to import data from other sources can be defined using the ‘Import Model’. Data import routines allow you to mirror data from sources into Sprngy data lake to offload operational reporting workloads and ingest curated and corelated data into business data lake for advanced analysis.

What is an Import Processor?

An import processor is a rule that could be applied to data. The rule is related to extracting/importing data from any type of relational database to the Hadoop File System. Sprngy Platform comes predefined with import processors that offer importing data from any type of relational database to HDFS.

How do I build an import Model?

The Meta Model can be built using the Import Model UI (AdminUI) or by building in a CSV format and uploading the CSV using the AdminUI. Please consult the product documentation.

How many import processors are available as of now?

There are three import processors are available in the Sprngy Platform. These are QUERY_IMPORT_PROCESSOR ,IN_MEMORY_IMPORT_PROCESSOR, and RDL_IMPORT_PROCESSOR. Moreover, there are five other processors available to support basic cleaning and moving data between data lakes. These processors are EDA_COMPLETENESS_PROCESSOR, HASH_RECORDS_PROCESSOR,DROP_DELTA_RECORDS_PROCESSOR,BATCH_INFO_APPEND_PROCESSOR
,FACT_RECORDS_PROCESSOR

How do I specify the sequence of Import Processors?

The Import processor sequence can be specified using the ImportModel UI or CSV format. Please consult the product documentation for sequencing.

What is the significance of step numbers in the Import model?

The step number is an essential part of the import model.

For QUERY_IMPORT_PROCESSOR, currently, step numbers 0, 1 and 4 are used.
For RDL_IMPORT_PROCESSOR, currently step number 0 and 1 are used.
For IN_MEMORY_IMPORT_PROCESSOR, currently, step number 2 and step number 3 are used.
For EDA_COMPLETENESS_PROCESSOR, HASH_RECORDS_PROCESSOR,DROP_DELTA_RECORDS_PROCESSOR,BATCH_INFO_APPEND_PROCESSOR
,FACT_RECORDS_PROCESSOR, currently step number 5 is used.

What is step sequence and how it is important?

The step sequence is related to the step number. For example, there are 5 entries for step number 1. Then in some of the cases, it needs to run each processor for step 1 in a pre-defined order, and to give that order one can use step sequence (1,2,3,4,5) to make sure everything runs in the order it is supposed to run.

Are there interdependencies between Import Processors?

Yes, there are interdependencies between import Processors. Please consult the product documentation for sequencing.

Is there a default Import Model that I can start with?

Yes, there is a default import model that you can start with.

Where are the Import Models stored?

Import models are stored within the Importfolder of the core data lake (SprngyPlatform/Import/BDL/Fact).

How is the storage and retrieval of Import Models?

Import models are stored within the Import folder of the core data lake. AdminUI provides interface for users to save the import model in the data lake as well as retrieve and edit it.

How can you analyze the Import Model Information?

Using sprngyBI dashboard for import model to retrieve and analyze the Ingest Model information.

What is fetch size in import model?

Fetch size is an important parameter in import model as it describes the number of rows it is fetching at a time from relational database while reading data into spark.

Why do we use the Import Model from RDBMS to RDL?

RDMBS, which stands for relational database management system, are databases that store data in a row-based table structure which connects related data elements together. Examples of RDBMS include MySQL and Oracle. Many companies will have their data stored in these databases, but in order to take advantage of the fast processing speeds of HDFS, we have to bring that data into a raw data lake (RDL) through the import model.

What's the purpose of rule_from_layer and rule_to_layer in the import model?

The Rule From Layer defines the location from where we are getting our data from, and the Rule To Layer defines which layer the data is being sent to. The From_Layer and To_Layer helps us overcome the issue of keeping two separate copies of same entity's import model for two type of import workloads.