14 October 2010

Mixing inheritance strategies in Entity Framework models

The Microsoft Entity Framework v4 runtime supports three entity inheritance models: TPT, TPH, and TPC. When using the built-in EF designer and tools in Microsoft Visual Studio 2010, the default inheritance strategy in model-first scenarios is TPT. TPH inheritance can be manually mapped in the designer, and TPC requires editing the mapping (MSL) in the EDMX file manually or using third party tools.

TPT, TPH, and TPC explained

TPT is short for Table Per Type. In TPT inheritance, there is one central table containing all the base class members and then one table per subclass containing the key and the members that are immediate (non-inherited) members of the subclass. In database terms this can be described as a normalized model, with additional tables for members that are not always present.

tptIllustration

TPH is short for Table Per Hierarchy. In TPH inheritance there is only one table representing the base class and all subclasses. Members must be nullable to avoid errors when a subclass don't contain members of another subclass. In database terms TPH can be described as a denormalized model, since one wide table is used to represent all of the types in the hierarchy with columns that are left with a null value if they are not members of the entity type.

tphIllustration

TPC is short for Table Per Concrete type. In TPC inheritance, every class in a inheritance hierarchy will have their own table. The inheritance hierarchy masks the fact that there are several independent underlying tables, representing each subtype.

tpcIllustration

Choosing inheritance strategy in Entity Framework

Choosing an inheritance strategy when using the VS2010 built-in tools often means making a model-wide decision on what type of entity inheritance to use. If you work in a 'Model First' pattern, where you create your entity classes in the model designer and then generate SSDL and the database from that, the designer will by default use TPT inheritance for all hierarchies. Microsoft have some plugins that can add model-first TPH support to the designer, and if you want to use TPC inheritance you need to do your mappings by hand using an XML editor or third-party tools.

Microsoft's Alex James has written a nice blog post on how to choose inheritance strategy, that describes the differences and merits to each of the inheritance strategies supported by Entity Framework:
http://blogs.msdn.com/b/alexj/archive/2009/04/15/tip-12-choosing-an-inheritance-strategy.aspx

However, I don't think the decision of what inheritance strategy to use should be a model-level decision. Each of the inheritance strategies have their merits and drawbacks:

  • TPT inheritance is sleek storage wise, the tables involved contain only the valid members. However, the SQL queries generated for complex TPT hierarchies can easily become very large and complex even for simple hierarchies. For complex hierarchies there can be a noticeable overhead in generating the queries, when SQL Server generates execution plans, and in cost of executing the queries.

    Even with a handful of subtypes, TPT hierarchies require EF to create all possible joins and then create a big UNIONed query covering all options, unless the query specifically asks for a single type within the hierarchy.

    Some of the performance considerations related to TPT are outlined in a blog post in the EF team’s blog:
    http://blogs.msdn.com/b/adonet/archive/2010/08/17/performance-considerations-when-using-tpt-table-per-type-inheritance-in-the-entity-framework.aspx
  • TPH inheritance results in wider tables with many nullable columns, since all types within a inheritance hierarchy maps to the same table. Nullable columns opens up the possibility for missing values going undetected (bad data), and wide tables will use more storage space than narrower tables.

    However, the SQL queries generated for TPH hierarchies are more simple and TPH inheritance can have a performance advantage over TPT both when generating queries and when running the queries. (Although it is worth noting that it doesn’t always have a performance advantage; pulling a few fields from a very wide table with many records can result in a lot of unnecessary I/O).
  • TPC inheritance is also good storage-wise since each type has its own table with no redundant columns. However, due to having multiple tables, TPC adds some constraints on the choice of keys.

    Using SQL Server’s int identity columns don't work very well together with TPC since there will be duplicate entity keys when inserting in different tables all having their own identity seed, so either a spread seed (where each table has its’ own initial seed value) will be needed, or a mechanism other than SQL Server’s identity keyword is needed. Some other RDBMSes have other mechanisms allowing a sequence to be shared by multiple tables, and something similar can be achieved with customized default constraints in SQL Server. Using GUID keys, or int identity keys with different starting ranges solves that problem.

    TPC is not supported by the VS2010 EFv4 designer’s mapping editor, so mapping TPC requires using an XML editor or third-party tools.

    Queries against TPC hierarchies, like TPT hiearchies, makes EF generate UNIONed queries, although the queries for TPC hierarchies are typically less complex than the queries generated for TPT inheritance hierarchies since all tables are self-contained and no joins are needed between base- and subtype tables, and no additional unions when the inheritance hierarchy is more than one level deep.

Depending on complexity of the inheritance hierarchy, the shape and volume of the data, the type of queries, etc., there are often good reasons to use different inheritance strategies within one model. Instead of a model-level decision, I think it is necessary to allow inheritance strategy to be decided on a case-by-case basis for each inheritance hierarchy within a model.

New inheritance strategy settings in the Model Comparer

Enter the third party tools. The latest version of the Model Comparer for EFv4 in Huagati DBML/EDMX Tools has added support for all three inheritance strategies supported by Entity Framework: TPT, TPH, and TPC. The settings dialog for the model comparer has a new setting allowing users to pick the default inheritance strategy used in model first scenarios; when creating SSDL/MSL/SQL-DDL from CSDL.

modelComparerOptions197

In addition to allowing a model-level default inheritance strategy for model-first scenarios, there is a fourth inheritance strategy setting called "Mixed".

inheritanceStrategyDropdown

When the inheritance strategy setting is set to Mixed, a dropdown for choosing inheritance strategy is displayed in the conceptual model tools area in the model comparer. This enable developers to pick inheritance strategy on-the-fly when adding new inheritance hierarchies to the model. The SSDL and MSL generated by the model comparer's "Update SSDL" feature will then be structured based on what is selected in the dropdown.

mixedModeDropdown

Caveats

  • Does this mean inheritance strategies can be mixed within one inheritance hierarchy? No, it doesn't. Each inherited tree of entity types must stick to one inheritance strategy, but independent inherited hierarchy of entities within the model can use different inheritance strategies. To enforce this, the model comparer stores the selected inheritance strategy for each CSDL entity set when generating SSDL, and will reuse the same inheritance strategy if e.g. a new subtype is added within a hierarchy.
  • Does this make it easier to choose what inheritance strategy to use? No, it doesn't. Choosing inheritance strategy should be done on a case-by-case basis, and based on facts. Will you have a lot of data or just a handful of records? How complex is your inheritance strategy? Look at the queries generated by EF for your particular model, profile the queries, generate sample data corresponding or exceeding the amount of data you expect your application will generate and profile again.
  • Does this make it a good idea to use entity inheritance in every corner of the model? No, it doesn't. Entity inheritance is a nice feature from a code and object-orientation perspective. However, in many scenarios it is not really needed. Don't make your model more complex than necessary. There are always tradeoffs on one side or the other: going berzerk with object-orientation can add database-side complexities such as performance issues, so try to find the right balance between object orientation and a clean database that keep both developers, DBAs, and end-users happy.


The Model Comparer for Entity Framework v4

The Model Comparer is part of the Huagati DBML/EDMX Tools add-in for VS2010. If you want to try it out, you can download it from http://huagati.com/dbmltools/ and get a trial license from the same site.

If you want to learn more about the basic features in the Model Comparer, and how it improves the EFv4 designer experience in Visual Studio 2010 with new options and features, please watch the screencasts in the following blog posts:
http://huagati.blogspot.com/2010/07/introducing-model-comparer-for-entity.html
http://huagati.blogspot.com/2010/08/using-model-comparer-to-generate.html

The Model Comparer and other features in the Huagati DBML/EDMX Tools add-in keep evolving, so keep an eye out for the new features that we keep adding to it. Release cycles are generally very short and we try to get new features, improvements, and of course also bugfixes out to users quickly.

2 comments:

  1. The ADO.net team posted a good intro/overview to TPT/TPH/TPC on their blog over at http://blogs.msdn.com/b/adonet/archive/2010/10/25/inheritance-mapping-a-walkthrough-guide-for-beginners.aspx

    ReplyDelete
  2. great post!, It solves me a lot of doubts

    ReplyDelete