Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Special character issue in filter and validation rules - filter records process would fail. The error log will indicate a ‘special character error’. Avoid special characters while defining filter and validation rules. If you are defining meta-model in excel or CSV, export that to an R file using dput function and search for any special characters and if there are any special characters, remove those before saving the metamodel to the core data lake.

  • No parent is defined in meta model - If no parent information is defined in the meta model i.e parent location and parent attributes, no nested data would be created.

  • For integer attributes, the default value in meta model should be defined as ‘0' and for numeric attributes, it should be ‘0.01’. For attributes with character data type, the default value should be ‘NOT AVAILABLE’

  • Impute method for numeric and integer should be mean and for characters, it should be DEFAULT.

  • meta model column names should be all lower case.

  • IQM metric match should be ‘YES’ only for integer and numeric columns. This should be set to YES only for integer or numerical columns that are meaningful to data analysis.

  • IQM codes should be ‘YES' for characters columns. This should be set to YES only for categorical columns that are meaningful to data analysis.

  • While defining meta model in CSV format, it needs to be ensured that Entity Name is exactly as defined in during ‘Application Configuration’ in AdminUI. The entity name is used to dynamically lookup data lake paths.

  • EDA Dimensions should be ‘YES’ for character columns. This should be set to YES only for categorical columns that are meaningful to data analysis.

  • EDA Metrics should be ‘YES’ for integer and numeric columns. This should be set to YES only for integer or numerical columns that are meaningful to data analysis.

  • EDA iterate by should be the as of the date and only the as of date. This would allow for creating EDA analytics on daily basis.

  • entity_attribute_nested_key_role should always be set to YES when it is a parent lookup key and it is in use indicator is also set to YES

  • Entity Attribute Calendar Join Key should only be specified as of the date and no other key.

  • Each record in meta-model needs to be unique (entity names, object/BI names all need to be unique).

  • Parent Lookup location should have a relative path instead of an absolute path

  • When defining the meta model in CSV format, an attribute for ‘as_of_date’ should be defined with data type as ‘Character’. While creating the meta model using the steps defined in AdminUI, this column is automatically added in the background.

  • Regarding nested attributes

    • We recommend to keep entity_attribute_compare_key_role, entity_attribute_parent_lookup_key_role as NO for as_of_date if the dates are not matching between two entities. If you keep YES then it would introduce NULLS in nesting for those rows where as_of_date is not matching between the two entity data sets and replace it with impute method defined in the meta model for the given attribute.

    • Moreover, entity_attribute_nested_key_role should be NO for the parent entity and it’s YES for the child entity. If the child entity has another child entity then it should be NO for that as well. The reason is if as_of_date values are different then it would introduce as_of_date_x and as_of_date_y in the data after nesting.

    • For other matching ID columns between two entities, entity_attribute_compare_key_role, entity_attribute_parent_lookup_key_role should be YES, with entity_attribute_parent_lookup_location defined to the FDL/Stage of the parent entity. This will nest the column between two entities based on the defined common attribute.

  • Regarding entity_attibute_calendar_join_key_role

    • If the data size is small, then don’t set entity_attibute_calendar_join_key_role to YES to any of the attributes. (NOTE: In spark, partitioning is an expensive operation when we try to partition too small or too big dataset)

    • If the data size is too big (greater than 100GB or something), then the best practice is, don’t try to partition the dataset i.e. don’t set entity_attibute_calendar_join_key_role to YES to any of the attributes.

    • If the data size is big and If you are setting entity_attibute_calendar_join_key_role to YES to any of the attributes then make sure that column is a date column and the actual data for the date column should be in the format of 'YYYY-MM-DD'. (NOTE: General thumb rule is that there should be one column called as_of_date which would serve all the purposes.). The value of as_of_date should not be ‘NOT AVAILABLE’ when calendar_join_key_role is YES for it.

...