What is Meta Model?
Meta model is format that allows to captured detailed information about attributes of data sets that are to be processed.
What type of information can be captured in Meta Model?
The following types of information can be captured in meta model on a per attribute basis for a given entity.
Data set Description - The module, entity and description of the data set.
Attribute Description - Name, Description, Format and Length. This can be captured at both the data model and BI model levels.
Attribute Security - Masking requirement and masking information.
Attribute Transforms - Transform rule that need to be applied at the inbound and outbound. By default trim is applied to all categorical attributes.
Attribute Filtering - Whether the attribute should be part of data filterring process and if so what rule.
Attribute IQM - Whether the attribute should be utilized in IQM match. For more information on IQM, please refer to IQM FAQs and BAPCore documentation.
Attribute EDA - Whether the attribute should be utilized in EDA analysis.
Attribute Calendar - Whether the attribute should be utilized in calendar join operations.
Attribute Validation - Whether the attribute should be part of data validation process and if so what rule.
Attribute Quality - Whether the attribute should be part of data validation process and if so what rule.
What are the data types supported in meta model?
Currently character, integer and numeric are supported.
Should I specify data colums as character data type as well?
Yes.
How do I validate meta model against the data set?
Developers can use the dataops.applymodeldatacompatibility to check the compatibility of the metamodel with the data.
Where are the Meta Models stored?
Meta models are stored within the Meta folder of the core data lake (BAPCORE/Meta/BDL/Fact).
How is the storage and retrieval of Meta Models?
Meta models are stored within the Meta folder of the core data lake using the BAPCore API. They can be retrieved using the BAPCore API as well.
How is the life cycle of the Meta Model managed?
The most current version of Meta Model is stored in Current folder (BAPCORE/Meta/BDL/Fact/Current). The archived version is stored in Archive folder (BAPCORE/Meta/BDL/Fact/Archive). When a new version is pushed, the current version is moved to the archive folder.
Can I know what Meta Model was used for a particular batch process?
Every time an metamodel is used for any data movement, the defintion of the meta model is stored in the track folder (BAPCORE/Meta/BDL/Fact/Track) along with the batch_id of the data movement process.
Can I define more than one parent entity?
Currently, that is not supported. A child entity can only belong to one parent entity. If you specify more than one parent, workloads will fail on orphan and nested records processor.
How does nesting work if I can define only one parent entity?
When processing child entity, you can pass a coleasced meta model that has all parents to that particluar child based on strict parent child relationship. For example, X is the child of Y and Z is the child of X, you can pass the coalesced meta model (both X and Y) along with entity model Z while processing data set for Z.
How are references to parent meta model stored in entity meta model for a particular batch process?
The parent meta model is coalesced with the entity meta model. The coalesced (entity meta model and parent entity meta model) are stored in the track folder (BAPCORE/Meta/BDL/Fact/Track) along with the batch_id of the data movement process. None of the attributes of the parent entity model are changed when coalesced with child entity model meta model. this helps in troubleshooting if all parent entity models are appropriately applied to child entity model for nesting processes.
What should the format of enttity_effective_date_time be?
Entity effective datetime should be a string else metamodel.read will fail.
What are the best practices for meta model?
Special character issue in filter and validation rules - filter records process would fail. Error log will indicate ‘special character error’. Avoid special characters while defining filter and validation rules. If you are defining meta model in excel or csv, export that to an R file using dput function and search for any special characters and if there are ny special characters, remove those before saving the metamodel to the core data lake.
No parent is defined in meta model - If no parent information is defined in the meta model i.e parent location and parent attributes, no nested data would be created.
For integers, default should be ‘0' and for numeric, it should be ‘0.01’. For characters, the default value should be ‘NOT AVAILABLE’
Impute method for numeric and integer should be mean and for characters, it should be DEFAULT.
meta model column names should be all lower case.
IQM metric match should be ‘YES’ only for integer and numeric columns. This should be set to YES only for integer or numerical columns that are meaningful to data analysis. For example. you would set customer_credit_score and customer_age to ‘YES’ but not postal_code (if its defined as an integer).
IQM codes should be ‘YES' for characters columns. This should be set to YES only for categorical columns that are meaningful to data analysis. For example. you would set customer_type and customer_group to ‘YES’ but not customer_name or customer_id.
Entity Name should be exactly as defined in HDFS folder structure. The entity name is used to dynamically look up data lake paths.
EDA Dimensions should be ‘YES’ for character columns. This should be set to YES only for categorical columns that are meaningful to data analysis. For example. you would set customer_type and customer_group to ‘YES’ but not customer_name or customer_id.
EDA Metrics should be ‘YES’ for integer and numeric columns. This should be set to YES only for integer or numerical columns that are meaningful to data analysis. For example. you would set customer_credit_score and customer_age to ‘YES’ but not postal_code (if its defined as an integer).
EDA iterate by should be the as of date and only the as of date. This would allow for creating EDA analytics on daily basis.
Entity Attribute Calendar Join Key should only be specified as as of date and no other key.
Each record in meta model needs to be unique (entity names, object/BI names all need to be unique).