The problems with and solutions to Repositories

Repositories are a design pattern which I have never been a huge fan of. I can see the use of them as a good layer boundary, but too often I see them being used all over the place instead of at an infrastructure level in a code base.

A particularly prevalent version of this misuse I see is self populating collections. These generally inherit List<TEntity> or Dictionary<TID, TEntity>, and provide a set of methods such as .LoadByParentID(TID id). The problem with this is that the collection still exposes methods such as .Add() and .Remove() - but these operations only run on the in-memory entities, and don’t effect the data source itself.

The Alternative

The technique I prefer for reads are Query objects. These are simple classes which expose a single public method to return some data. For example:

public class GetDocumentsWaitingQuery : IDocumentsQuery
{
	private readonly IDataStore _dataStore;

	public GetDocumentsWaitingQuery(IDataStore datastore)
	{
		_dataStore = datastore;
	}

	public IEnumerable<Document> Execute()
	{
		using (var connection = _dataStore.Open())
		{
			return connection
				.Query<Document>(
					"select * from documents where status == @status",
					new { status = DocumentStatuses.Waiting})
				.ToList();
		}
	}
}

The code using this class might look something like this:

public class DocumentProcessor
{
	private readonly IDocumentsQuery _query;

	public DocumentProcessor(IDocumentsQuery waitingDocumentsQuery)
	{
		_query = waitingDocumentsQuery;
	}

	public void Run()
	{
		foreach (var document in _query.Execute())
		{
			//some operation on document...
		}
	}
}

This class is almost too simple, but resembles a system’s processor which I wrote. They key here is that the DocumentProcessor only relies on an IDocumentsQuery, not a specific query.

Normal usage of the system looks like this:

public void ProcessAll()
{
	var query = new GetDocumentsWaitingQuery(_dataStore);
	var saveCommand = new SaveDocumentCommand(_dataStore);

	var processor = new DocumentProcessor(query, saveCommand);

	processor.Run();
}

When the user requests a single document get reprocessed, we just substitute in a different Query:

var query = new GetDocumentByIDQuery(_dataStore, id: 123123);
var saveCommand = new SaveDocumentCommand(_dataStore);

var processor = new DocumentProcessor(query, saveCommand);

processor.Run();

And finally, when the system is under test, we can pass in completely fake commands:

[Fact]
public void When_multiple_documents_for_the_same_user()
{
	var first = new Document { .UserID = 1234, .Name = "Document One" };
	var second = new Document { .UserID = 1234, .Name = "Document Two" };

	var query = Substitute.For<IDocumentsQuery>();
	query.Execute().Returns(new[] {first, second});

	var processor = new DocumentProcessor(query, Substitute.For<ISaveDocumentCommand>());
	processor.Run();

	first.Primary.ShouldBe(true);
	second.Primary.ShouldBe(false);
}

This means that in the standard usage, it gets passed an instance of GetDocumentsWaitingQuery, but when under test gets a Substitute.For<IDocumentsQuery>(), and for debugging a problem with a specific document, it gets given new GetSingleDocumentQuery(id: 234234) for example.

Commands

What about saving? Well it’s pretty much the same story:

public class SaveDocumentCommand
{
	private readonly IDataStore datastore;

	public SaveDocumentCommand(IDataStore datastore)
	{
		_dataStore = datastore
	}

	public void Execute(Document document)
	{
		using (var connection = _dataStore.Open())
		{
			connection.Execute("update documents set status = @status where id = @id", document);
		}
	}
}

Obviously the sql in the save command would be a bit more complete…

But Repositories…

Well yes, you can create methods on your repositories to do all of this, like so:

public IDocumentRepository
{
	public void SaveDocument(Document document) { /* ... */ }
	public IEnumerable<Document> GetDocumentsWaiting() { /* ... */ }
}

But now your classes utilising this repository are tied to the methods it implements - you cannot just swap out the workings of .GetDocumentsWaiting for a single document query any more.

This is why I like to use Command and Query objects - the not only provide good encapsulation (all your sql is contained within), but they also provide a large level of flexibility in your system, and make it very easy to test to boot too!

The Alternative#

Commands#

But Repositories…#

The Alternative

Commands

But Repositories…