Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Should I specify data columns as character data types as well?

Yes.

How do I validate meta model against the data set?

Developers can use the dataops.applymodeldatacompatibility to check the compatibility of the metamodel with the data.

Where are the Meta Models stored?

...

Meta models are stored within the Meta folder of the core data lake. AdminUI provides interface for users to save the meta model in the data lake as well as retrieve and edit it. Meta models can be stored and retried using the sprngy API as well.

How is the life cycle of the Meta Model managed?

The most current version of Meta Model is stored in Current folder (SPRNGYPlatform/Meta/BDL/Fact/Current). The archived version is stored in Archive folder (SPRNGYPlatform/Meta/BDL/Fact/Archive). When a new version is pushed, the current version is moved to the archive folder.

Can I know what Meta Model was used for a particular batch process?

...

  • Regarding nested attributes

    • We recommend to keep entity_attribute_compare_key_role, entity_attribute_parent_lookup_key_role as NO for as_of_date if the dates are not matching between two entities. If you keep YES then it would introduce NULLS in nesting for those rows where as_of_date is not matching between the two entity data sets and replace it with impute method defined in the meta model for the given attribute.

    • Moreover, entity_attribute_nested_key_role should be NO for the parent entity and it’s YES for the child entity. If the child entity has another child entity then it should be NO for that as well. The reason is if as_of_date values are different then it would introduce as_of_date_x and as_of_date_y in the data after nesting.

    • For other matching ID columns between two entities, entity_attribute_compare_key_role, entity_attribute_parent_lookup_key_role should be YES, with entity_attribute_parent_lookup_location defined to the FDL/Stage of the parent entity. This will nest the column between two entities based on the defined common attribute.

  • Regarding entity_attibute_calendar_join_key_role

    • If the data size is small, then don’t set entity_attibute_calendar_join_key_role to YES to any of the attributes. (NOTE: In spark, partitioning is an expensive operation when we try to partition too small or too big dataset)

    • If the data size is too big (greater than 100GB or something), then the best practice is, don’t try to partition the dataset i.e. don’t set entity_attibute_calendar_join_key_role to YES to any of the attributes.

    • If the data size is big and If you are setting entity_attibute_calendar_join_key_role to YES to any of the attributes then make sure that column is a date column and the actual data for the date column should be in the format of 'YYYY-MM-DD'. (NOTE: General thumb rule is that there should be one column called as_of_date which would serve all the purposes.). The value of as_of_date should not be ‘NOT AVAILABLE’ when calendar_join_key_role is YES for it.

I have a large number of small files in Meta folder that may impede read performance?

You can consolidate multiple Meta files into one using consolidate functionality. This is assuming that the schema of all files in the meta-model folder is the same.

Can I consolidate Meta files?

...

    • .

What if there is Parent-child relationship in your data module?

...