Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

How do I build the Meta Model?

Use admin UI to create it. Please refer to its documentation.

or

The ingest model The Meta Model can be built using the Ingest Meta Model UI (AdminUI) or by building in a CSV format and uploading the CSV using the AdminUI. Please consult the product documentation for sequencing.

What type of information can be captured in Meta Model?

The following types of information can be captured in meta model on a per attribute basis for a given entity.

  1. Data set Description - The module, entity, and description of the data set.

  2. Attribute Description - Name, Description, Format, and Length. This can be captured at both the data model and BI model levels

...

  1. .

  2. Attribute Transforms - Transform rule that needs to be applied at the inbound and outbound. By default, trim is applied to all categorical attributes.

  3. Attribute Filtering - Whether the attribute should be part of the data filtering process and if so what rule.

  4. Attribute IQM - Whether the attribute should be utilized in IQM match. For more information on IQM, please refer to IQM FAQs and

...

  1. Sprngy documentation.

  2. Attribute EDA - Whether the attribute should be utilized in EDA analysis.

  3. Attribute Calendar - Whether the attribute should be utilized in calendar join operations.

  4. Attribute Validation - Whether the attribute should be part of the data validation process and if so what rule.

  5. Attribute Quality - Whether the attribute should be part of the data validation process and if so what rule.

What are the data types supported in meta model?

Currently, character, integer, and numeric are supported.

Should I specify data columns as character data types as well?

Yes.

How do I validate meta model against the data set?

Developers can use the dataops.applymodeldatacompatibility to check the compatibility of the metamodel with the dataIt should be noted that while creating an attribute in the metamodel using the AdminUI, if the data type is selected as character, default transformation of Trim is applied. This default transformation isn’t applied for integer or numeric data types.

While editing a meta model, if the data type is changed, that the transform rule is required to be updated appropriately for the workload to run successfully.

Should I specify data columns as character data types as well?

Yes.

Where are the Meta Models stored?

Meta models are stored within the Meta folder of the core data lake (BAPCORESPRNGYPlatform/Meta/BDL/Fact).

How is the storage and retrieval of Meta Models?

Meta models are stored within the Meta folder of the core data lake using the BAPCore API. They can be retrieved using the BAPCore API as well.

How is the life cycle of the Meta Model managed?

The most current version of Meta Model is stored in Current folder (BAPCORE/Meta/BDL/Fact/Current). The archived version is stored in Archive folder (BAPCORE/Meta/BDL/Fact/Archive). When a new version is pushed, the current version is moved to the archive folder.. AdminUI provides interface for users to save the meta model in the data lake as well as retrieve and edit it.

Can I know what Meta Model was used for a particular batch process?

Every time an a metamodel is used for any data movement, the definition of the meta model is stored in the track folder (BAPCORESPRNGYPlatform/Meta/BDL/Fact/Track) along with the batch_id of the data movement process.

...

Currently, that is not supported. A child entity can only belong to only one parent entity. If you specify more than one parent, workloads will fail on orphan and nested records processors.

...

The parent meta model coalesces with the entity meta model. The coalesced (entity meta model and parent entity meta model) are stored in the track folder (BAPCORESPRNGYPlatform/Meta/BDL/Fact/Track) along with the batch_id of the data movement process. None of the attributes of the parent entity model are changed when coalesced with the child entity model meta model. this helps in troubleshooting if all parent entity models are appropriately applied to the child entity model for nesting processes.

...

  • Special character issue in filter and validation rules - filter records process would fail. The error log will indicate a ‘special character error’. Avoid special characters while defining filter and validation rules. If you are defining meta-model in excel or CSV, export that to an R file using dput function and search for any special characters and if there are any special characters, remove those before saving the metamodel to the core data lake.

  • No parent is defined in meta model - If no parent information is defined in the meta model i.e parent location and parent attributes, no nested data would be created.

  • For integersinteger attributes, the default value in meta model should be defined as ‘0' and for numeric attributes, it should be ‘0.01’. For charactersattributes with character data type, the default value should be ‘NOT AVAILABLE’

  • Impute method for numeric and integer should be mean and for characters, it should be DEFAULT.

  • meta model column names should be all lower case.

  • IQM metric match should be ‘YES’ only for integer and numeric columns. This should be set to YES only for integer or numerical columns that are meaningful to data analysis. For example. you would set customer_credit_score and customer_age to ‘YES’ but not postal_code (if its defined as an integer).

  • IQM codes should be ‘YES' for characters columns. This should be set to YES only for categorical columns that are meaningful to data analysis. For example. you would set customer_type and customer_group to ‘YES’ but not customer_name or customer_id.Entity Name should be exactly as defined in the HDFS folder structure

  • While defining meta model in CSV format, it needs to be ensured that Entity Name is exactly as defined in during ‘Application Configuration’ in AdminUI. The entity name is used to dynamically lookup data lake paths.

  • EDA Dimensions should be ‘YES’ for character columns. This should be set to YES only for categorical columns that are meaningful to data analysis. For example. you would set customer_type and customer_group to ‘YES’ but not customer_name or customer_id. as_of_date should be set to NO for eda_dimension

  • EDA Metrics should be ‘YES’ for integer and numeric columns. This should be set to YES only for integer or numerical columns that are meaningful to data analysis. For example. you would set customer_credit_score and customer_age to ‘YES’ but not postal_code (if its defined as an integer). as_of_date should be set to NO for eda_metric.

  • EDA iterate by should be the as of the date and only the as of date. This would allow for creating EDA analytics on daily basis.

  • entity_attribute_nested_key_role should always be set to YES when it is a parent lookup key and it is in use indicator is also set to YES

  • Entity Attribute Calendar Join Key should only be specified as of the date and no other key.

  • Each record in meta-model needs to be unique (entity names, object/BI names all need to be unique).

  • Parent Lookup location should have a relative path instead of an absolute path (i.e. /BigAnalytixsPlatform/BAPRAM/Customer/FDL/Stage)IQM Processor should be run with the nesting records processor ON, otherwise, turn IQM off if not Nesting

  • When defining the meta model in CSV format, an attribute for ‘as_of_date’ should be defined with data type as ‘Character’. While creating the meta model using the steps defined in AdminUI, this column is automatically added in the background.

  • Regarding nested attributes

    • We recommend to keep entity_attribute_compare_key_role, entity_attribute_parent_lookup_key_role as NO for as_of_date if the dates are not matching between two entities. If you keep YES then it would introduce NULLS in nesting for those rows where as_of_date is not matching between the two entity data sets and replace it with impute method defined in the meta model for the given attribute.

    • Moreover, entity_attribute_nested_key_role should be NO for the parent entity and it’s YES for the child entity. If the child entity has another child entity then it should be NO for that as well. The reason is if as_of_date values are different then it would introduce as_of_date_x and as_of_date_y in the data after nesting.

    • For other matching ID columns between two entities, entity_attribute_compare_key_role, entity_attribute_parent_lookup_key_role should be YES, with entity_attribute_parent_lookup_location defined to the FDL/Stage of the parent entity. This will nest the column between two entities based on the defined common attribute.

  • Regarding entity_attibute_calendar_join_key_role

    • If the data size is small, then don’t set entity_attibute_calendar_join_key_role to YES to any of the attributes. (NOTE: In spark, partitioning is an expensive operation when we try to partition too small or too big dataset)

    • If the data size is too big (greater than 100GB or something), then the best practice is, don’t try to partition the dataset i.e. don’t set entity_attibute_calendar_join_key_role to YES to any of the attributes.

    • If the data size is big and If you are setting entity_attibute_calendar_join_key_role to YES to any of the attributes then make sure that column is a date column and the actual data for the date column should be in the format of 'YYYY-MM-DD'. (NOTE: General thumb rule is that there should be one column called as_of_date which would serve all the purposes.). The value of as_of_date should not be ‘NOT AVAILABLE’ when calendar_join_key_role is YES for it.

What if some module’s entities have a dependency on other module’s entities (ex: BAPOIM (FieldActivity) depends on BAPRAM(ServicePoint))?

In this case, when you are uploading a meta-model for the parent entity(ServicePoint as per example), need to change the parent module name to the current module name. (As per the given example, For ServicePoint, need to change the module name from ‘BAPRAM' to 'BAPOIM’)

I have a large number of small files in Meta folder that may impede read performance?

You can consolidate multiple Meta files into one using consolidate functionality. This is assuming that the schema of all files in the meta-model folder is the same.

Can I consolidate Meta files?

...

    • .

What if there is Parent-child relationship in your data module?

...

Transform rules are SQL queries you can write in order to filter data by a variable, change certain values in a column, and apply the other functions that are in the Spark SQL library. We can write the transform rules in the meta model by clicking the edit icon next to the column name and going to the Rules page.

...

In the Entity Attribute Transform Rule Inbound line, write the SQL query that defines the transformation you are trying to make. In the example above, we are in the NBA Statistics application, and want to rename all occurrences of ‘Philadelphia Sixers’ in the team column to ‘Philadelphia 76ers’.

...

We can write the query “case when team in ('Philadelphia Sixers', 'Philadelphia 76ers') then 'Philadelphia 76ers' else team end as team” as shown above for this transform rule.

...

In the same NBA Statistics application, another change we have to make to the dataset is removing all periods and commas from players' names. To do this, we click on the edit icon next to the player_name column, and under the Transform Rule Inbound, define the query “replace(replace(player_name, '.', ''), ',', '') as player_name”.

Here are a couple things to keep in mind when writing your transform rules:

  1. Do not put ‘Select’ at the beginning of the query, as it is appended automatically to the front when the backend processor runs

  2. The SQL query should end in “as <column_name>”

The transform rules that you define will be executed in the SDL-FDL workload when you define the TRANSFORM_RECORDS_PROCESSOR based on the transformation preferences defined in the Ingest Model.