Question

How to implement a Repository in DDD correctly

I have a question regarding DDD and the repository pattern.

I have an AggregateRoot Report that has Transactions (which is an Entity Array).

I used to have a CRUD repository and when I was modifying a transaction I used to fetch the report, apply my domain method and save the report and that was working when I was using a NoSQL database as I had a single Document.

Problem: I switched to RelationalDB and now if I were to e.g. add a transaction I wouldn't be able to simply save the whole aggregate in the database as Transactions are in a different table and I would have to only add a row in that table. If I have the whole aggregate I have to take a delta to understand what it needs to update.

Solution 1: The first thing that comes to mind is to use repository methods like AddTransaction(reportId, transaction) instead of update, but I worry that information is leaked to the repository and also it troubles me that a change in the DB made my change my ApplicationLayer and possibly the domain layer as now I would have to return the created transaction from the domain in order to save it.

Solution 2: Another interesting approach is to create a domainEvent with all the relevant information e.g. TransactionCreated(Transaction) and in the repository I could read this event and then do the relevant changes like solution 1.

Both approaches suggest changes to Application and Domain layer because I changed database which is the whole concept of DDD. Is the CRUD repository only for small aggregates? I would highly appreciate your feedback and ideas.

3 53 3

1 Jan 1970

Solution

Accessing a repository, as you suggest, shall be done only by aggregate. Hence, this means that writing your aggregate in the repository (whatever the repository is) needs to be contained in a transaction.

A couple of suggestions here.

Do not access your aggregate from its bowels.

Your first solution is making a subpart of the aggregate emerge at interface level. You should refrain from adopting solutions like these if your direction is the one of using aggregates. The definition of an aggregate is a transactional boundary. If you perform operations on the inside of the boundary, even in accessing data, it means that you are moving the transactional boundary, hence cutting your solution into more than one aggregate.

Your second solution has the same problem as well: you may call the repository directly or fire an event and respond asynchronously, but that does not change the fact that you are accessing the aggregate from the internals.

Use hexagonal architecture.

Your repository can be a port in your domain model, whose interface treats just the aggregate, e.g, reports.save(report: Report). The port is an interface, it does not have implementations: this means that your domain model is free from implementation details regarding the technology you use to store data. Also, the application layer uses the ports to implement its use cases, so details on how the repository is actually implemented do not leak in there.

Then, you may have one adapter for each implementation you may have, e.g., JdbcReportRepository or MongoRepository could both implement the Reports interface I mentioned before.

Be transactional.

In your adapter, find a way of storing your aggregate so that the operation is transactional and you can rollback the operation in case of errors.

That means, you can decide to store your aggregate in two separate tables, but you have to do every write operation (even in update, not just in creation) in a transaction.

Another option, that is the one that I personally prefer, is to use the memento pattern, where:

An aggregate is able to generate a snapshot of itself, that is, its serializable version
An aggregate is able to rehydrate itself from a previously generated snapshot

You can think of a snapshot as a picture that you take on a certain moment for your aggregate: that picture contains all the details regarding the state of your aggregate, and all the information needed to rehydrate it in case of queries.

A snapshot may be in JSON form, for instance, so that it would fit a single column in a SQL table. Here, your tables should be then reduced to just one, for instance in the form Report(id, snapshot, version), where the snapshot would be a JSON object containing the report AND its transactions. Please notice that I added a version column to my suggestion: it is needed to avoid concurrent, conflicting writes on the same aggregate in case of updates.

2024-07-19

Eleonora Ciceri