The usage of ORMs (data mappers) comes with a great power but also with a great responsibility. ORMs give us a convenient manner to handle databases avoiding some boilerplate; they also can be used incorrectly, this way being a source of problems in performance or scalability. 

Many to many relationships mapping is one of those options commonly used and it may give rise to more harm than good.

 

Let’s start with a simple example where there are 2 entities: Products and Tags. Products can have multiple tags and tags can have multiple products. Let’s try using many to many relationships mapping in both entities. Just by invoking product.tags it’s simple to retrieve a list of the tags for the given product; with tag.products we can easily get all the products of a tag. Quite direct and straightforward to use.

But any time we request a product, several extra queries may be executed under the hood, which can lead to bad performance.
Also, do we need all the tags information every time we ask for the product information? 

Sure, we can enable lazy loading (adding extra configuration to our entities) to avoid asking the unneeded data but when the data is needed, ORMs usually query the database individually for each child (the issue known as N+1 query problem). In the end, much simpler for the coder, but worse for CPU and IOPS.

 

Besides, many to many relationships mapping is quite restrictive in terms of extra information. What happens if we need to know when a tag was assigned to a product, or who did the assignation? In this case it’s quite clear we need an intermediate relationship where this data is kept.

Extracted from the doctrine documentation (known ORM for PHP):

Why are many-to-many associations less common? Because frequently you want to associate additional attributes with an association, in which case you introduce an association class. Consequently, the direct many-to-many association disappears and is replaced by one-to-many/many-to-one associations between the 3 participating classes.

 

Anyway, let’s assume that extra data is not needed, just the relationship. Both entities should exist independently from each other, meaning that each entity is part of a different aggregate root. And, from DDD perspective, aggregate roots should not reference another aggregate, just their ID.

Also, this model may have problems in a concurrent environment. In a real scenario, we could  meet the following use cases:

  1. The description of product 1 is updated
  2. A tag name is updated
  3. The tag is assigned / removed from product 1

These 3 operations should be done at the same time without any issue. But using many to many relationship mapping, the third use case may need to be locked until the rest are completed or some inconsistencies may occur.

Again, to avoid this conflict we should partition our model in 3 participants.Either way, a new important concept in our domain has arised: Tagging is a new aggregate root with the responsibility of assigning tags to products. With this solution the issue of writing in the 3 tables concurrently has been fixed.

But I can’t do product.tags anymore…

That’s true, but the real question is: why do we need to navigate between two aggregate roots at all?

Thinking in CQRS terms, on the query side, when we need to retrieve information usually it can be done with a specific query joining all the needed tables. This way avoiding hydrating entities and saving memory and CPU time a simple DTO with the data requested can be returned.

Example: get the list of tags of a given product id

Val product = productRepository.findById(productId)
Val tags = product.tags()
Foreach (Tags as tag) {
	tagsDTO [] = TagDTO.from(tag)
}
Return tagsDTO

Vs:

Select * from Product p
INNER JOIN tagging g ON g.product_id = p.id
INNER JOIN tags t ON g.tag_id = t.id		

On the command side, we should modify just one aggregate at the same time so there is no need to navigate from one aggregate to another.

Using this budget version of CQRS we are also breaking responsibilities and having much simpler code.

 

More benefits

  • Avoid all the lazy loading configuration and pre-cached queries which can bring some scalability issues (N+1 query problems)
  • The domain logic should be managed by us in our domain, not by the ORM which is part of the infrastructure
  • Our model could be mapped into three different databases without changing any line of code. This is an important point for scalability
  • The 3 models could belong to different bounded contexts because they are decoupled giving us better performance
  • Hibernate doesn’t recommend their usage either  

 

Do not use exotic association mappings:

Practical test cases for real many-to-many associations are rare. Most of the time you need additional information stored in the “link table”. In this case, it is much better to use two one-to-many associations to an intermediate link class. In fact, most associations are one-to-many and many-to-one. For this reason, you should proceed cautiously when using any other association style.

 

Conclusion

If we are able to NOT map relationships between aggregate roots, we can create many independent “islets” (the domain models) that do not depend on each other, and so being able to be independently distributed to achieve the maximum scalability, without relying on any underlying mapping technology between them.

Think in another solution when many to many is a possible option for your problem.

 

This article was written by Oriol Saludes (FullStack Developer at Apiumhub) & Christian Ciceri (Software Architect at Apiumhub).