LINQ Distinct Enhanced


Hello fellow programmers. Today we are going to discuss an issue I have run into in a few projects now and thought it would be great if I could find a generic way to handle it so that I wasn’t repeating myself ( good old DRY method at work). The problem I am talking about today is the scenario where you are writing LINQ queries against a database and have the need to return information based on certain criteria and then ONLY send back one copy of an item with the same value in a particular column. Often, this comes from something that a DBA placed in a table that probably could have been normalized.

Take for instance a table called Person. This person table has a column called Title. Now, you want to provide a search form with a  dropdown that contains the various titles available for searching. There’s only one problem. There are more than one person that has the title “Software Engineer”. Take the following query for example:


query = _database.Person.Where(x => x.Terminated != true).ToList()


Ok, now this query will search the Person table for anyone that is not currently terminated. Simple enough right? Now lets change this a little bit to be the following:


query = _database.Person.Where(x => x.Terminated != true)
                        .Select(y => new List<string>(){y.Title}.ToList()


Now this is instead going to return a list of strings that contain the Title value of all the Person objects that are not terminated. So what if  employee ‘Bob John’  is a Software Engineer and so is ‘Joe Bob’? You are going to have multiples of the same Title in your list. This will really confuse your users huh? What to do about it? The way I handled this scenario is using the Distinct method in LINQ, but I made a generic version. This allowed me to write some code that would no longer require me to inherit from IEqualityComparer and have to override the Equals and GetHash methods. If I did it that way, I would be required to do it for every object I wanted to apply a Distinct query on. This is because even though you could make the IEqualityComprarer generic, you will never know when you go to create the Equals method what property you want to do the check on (There are default values that Distinct will use, but they are never what you need to check against if you are already looking this far into the code.

Here is what I came up with:

        /// <summary>
        /// Distincts the by.
        /// </summary>
        /// <typeparam name="TEntity">The type of the entity.</typeparam>
        /// <typeparam name="TKey">The type of the key.</typeparam>
        /// <param name="source">The source.</param>
        /// <param name="keySelector">The key selector.</param>
        /// <returns></returns>
        public static IEnumerable<TEntity> DistinctBy<TEntity, TKey>(this IEnumerable<TEntity> source,
                                                                     Func<TEntity, TKey> keySelector) where TEntity : class
            var keys = new HashSet<TKey>();

            foreach (var element in source)
                if (keys.Add(keySelector(element)))
                    yield return element;

This method allows you to pass in the selector (property) you wish to check against so that you know what it will be checking each time. It then creates a HashSet for adding elements. The reason I used a HashSet is because it is real quick at deciding if the item being added already exists or not based on the key (in this case what you passed in). The loop goes through each element in the list and tries to add it to the HashSet. If it succeeds, the Yield return holds it until all the elements have been run and then returns the list of items that were able to be added.

The other nice thing you’ll notice is that the first parameter is set to “this IEnumerable”. What this means is that it will be an extension method for anything within the same namespace (or a using/imports is added) that is of type IEnumerable. If you wish to familiarize yourself with extension methods, click here.

To use the code above, here is an example:

_database.Person.Where(s => s.Terminated !=true).DistinctBy(x => x.Title).ToList();

I hope this has been informative and helped a few people out of some nasty jams.


Happy Coding! Smile


About Gregg Coleman

I am Senior-level Software Engineer working primarily these days with .NET. I have a good working knowledge of ASP.NET MVC, Web Forms, WCF web services and Windows Services. I spend much of my time in the Web Services (SOAP and REST) world in my current job designing and implementing various SOA architectures. I have been in the software engineering industry for about 6 years now and will not now nor ever consider myself an "expert" in programming because there is always so much to learn. My favorite thing about designing software is there are always new emerging technologies and something to learn every day! My current job has me spending much of my job on the bleeding edge of technologies and changing gears all the time, so I'm never bored and always challenged. On my spare time I enjoy weight training, reading and venturing to new places near by. Of course programing and learning new technologies are another hobby of mine.
This entry was posted in .NET, .NET 4.5, C#, Classes, Computer Technology, Extension Methods, Generics, Interfaces, MVC and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s