Question

Handling very large collections in DDD

I've seen this kind of question asked before, but the business requirements were slightly different, so the suggested solutions don't work well for me. I've been learning Domain Driven Design (DDD) recently, and I figured the best way to do it is to rewrite one of my past real-world projects the DDD way. Here is one requirement I'm struggling with, simplified a bit to omit some unnecessary details:

The user can create an "Incident" record in the application.
The incident is initially created as a draft, and while it is a draft the user can then add affected servers to the incident.
There can be up to 30 000 servers added - but no more than that
The drafts can exist for a long time (days, even weeks), and new servers can be added at any point

There are 2 ways to implement this logic that I came up with, however they both have some downsides and I'm not sure if they follow the DDD principles correctly:

The initially obvious way to model it is to create an Aggregate Root called Incident, which will contain a collection of AffectedServer entities/aggregates. Then, when a user adds a new server, we retrieve the Incident from a repository and attempt to add the server. An example in C#:

public class Incident 
{
   //Ommited the constructor and other fields/properties/methods

   private List<AffectedServer> _servers;
   public IReadOnlyList<AffectedServer  Servers => _servers;

   public void AddServer(AffectedServer server) 
   {
      if (_servers.Count > 30_000)
      {
         throw new InvalidOperationException("The number of servers cannot exceed 30 000");
      }
      _servers.Add(server);
      // publish the domain events here
   }
}

However, when adding a new server we have to retrieve the server collection as well, which will cause a noticeable performance hit.

The other solution is to instead extract AffectedServer to a separete aggregate root itself and handle the domain logic (ensuring that there are no more than 30 000 servers) within a Domain Service. The new aggregate root would still reside in the same bounded context as the Incident aggregate root, and would refer to the incident only by an identifier. An example in C#:

public class AffectedServerDomainService
{
   //Ommited the constructor and other fields/properties/methods

   public async Task AddServerToIncident(AffectedServer affectedServer) 
   {
      int serverCount = _serverRepository.GetServerCountForIncident(affectedServer.IncidentId);
      if (serverCount >= 30_000) 
      {
         throw new InvalidOperationException("The number of servers cannot exceed 30 000");
      }
      _serverRepository.AddServer(affectedServer);
      // publish the domain events here
   }
}

This prevents us from retrieving a potentially very large collection from the repository, but since it is no longer a part of the Incident aggregate root (where it feels like it should naturally belong) it feels like the domain logic becomes a bit fractured.

I feel like the #2 solution is the better one, but am I correct? Does it violate DDD principles, or is it still a valid approach?

3 53 3

1 Jan 1970

Solution

This question seems opinion-based, but I think there's only one answer. I doesn't make sense to build 30,000 object in memory if you only need their count. So alternative 2 is the way to go (assuming that GetServerCountForIncident boils down to a simple count query in SQL).

To broaden the perspective a bit, this is one example of how DDD principles don't belong in a data access layer. Seemingly, in this DDD mindset, an entity has to be an "aggregate root" for you to be "allowed" to query it. So the reasoning is: I need data from entity A, so A has to be an aggregate root and have a dedicated domain service. That's a lot of overhead just to get some data.

It may seem a well-defined, unambiguous architecture, but in reality it leads to arbitrary decisions: do we query B through A.Bees (and ADomainService) or do we create a BDomainService? (And why not CDOmainService for A.C?)

The EF class model is part of a data access layer. It's not a domain model. Mixing data access and DDD concerns is conceptually impossible, because both concern have opposing demands. To mention just a few: bidirectional vs. mono-directional relationships, surrogate keys vs natural keys, anemic classes vs encapsulation.

2024-07-15

Gert Arnold

Solution

The general term for what you are trying to achieve here is set validation: given some collection of things, ensure that some invariant is maintained for the entire collection.

And in the general case, that means you need to lock the entire set to prevent it from being changed while you are working on it.

For example:

   public async Task AddServerToIncident(AffectedServer affectedServer) 
   {
      int serverCount = _serverRepository.GetServerCountForIncident(affectedServer.IncidentId);
      if (serverCount >= 30_000) 
      {
         throw new InvalidOperationException("The number of servers cannot exceed 30 000");
      }
      _serverRepository.AddServer(affectedServer);
      // publish the domain events here
   }

Here, you have a race condition if two processes (which in the general case might be isolated from each other - running on different machines) each check the server count before the other has committed the AddServer changes.

So if you really do need to ensure that some invariant holds over a set, it might be simpler to use an "aggregate", because you've already got a locking strategy in place, and thus you avoid creating a new pattern for the "weird cases".

(Note: that's not necessarily "the" aggregate - if modifying the servers in the collection doesn't depend on other incident information, and if modifying the other incident information doesn't depend on the data in the server collection, then you might have more than one aggregate which includes some correlation information so you can Frankenstein the right reports together later.)

That said:

What is the business impact of having a failure

Here, you are describing the activity as editing a draft; are you sure that arbitrary invariant should hold during this draft phase? Really? Because from out here where we aren't experts in your specific domain, that seems really suspicious.

(Ex: even if the 30K limit is real, is it so important that we need to remove a server before inserting 30001, versus adding the overdraft first and then relieving the tension by removing a server to make quota?)

Based on your description, this system is not "really" the book of record; your operators are capturing details about an incident, and the incident occurred outside of your domain model. In other words, you are capturing details about the external world, and then doing some processing on those details. Having your incident database refusing reports because some internal invariant would not be satisfied seems pretty backwards.

(It's not directly related, but it may be useful to review some of the DDD talks on line that discuss "warehouse systems", where you have to account for the fact that there may be boxes in the warehouse that the domain model doesn't know about, or boxes absent from the warehouse that domain model knows are there.)

2024-07-15

VoiceOfUnreason