top of page
Search

Fundamentals of Object Equality in Software


Introduction


When it comes to object equality in Object-Oriented Programming, there are fundamentals

that we need to pay attention to. Let's discuss some of these. I will show examples in C#, but the concepts should apply to any object-oriented programming language.


If we were speaking of primitive types like Integers, Booleans and Strings we can easily check for equality as follows:


int x = 5;
int y = 7;

string s1 = "hello!";
string s2 = "hello!";

bool b1 = true;
bool b2 = false;

Console.WriteLine(x.Equals(y));
Console.WriteLine(s1.Equals(s2));
Console.WriteLine(b2.Equals(b1));

And as expected, the program will print the following result:


False
True
False

However, most of the time we are dealing with more complex objects in our systems. We might want to check if two different instances of an object logically represent the same value.


For instance, say we have to keep track of various buildings in the world in a distributed database, and when performing operations in-memory we represent the Building object as follows:



class Building
    {
        public int YearBuilt { get; set; }

        public int MonthBuilt { get; set; }

        public string City { get; set; }

        public string Country { get; set; }

        public string BuildingName { get; set; }

        public bool IsGovernmentBuilding { get; set; }

        public Building(int yearBuilt,
                        int monthBuilt, 
                        string city, 
                        string country,
                        string buildingName, 
                        bool isGovernmentBuilding)
        {
            YearBuilt = yearBuilt;
            MonthBuilt = monthBuilt;
            City = city;
            Country = country;
            BuildingName = buildingName;
            IsGovernmentBuilding = isGovernmentBuilding;
        }
    }

As you can see we represent the Building by its name, whether it is a government building or not, the city its built in and the country. We also include the year and month the building was built.


Examine the following code for example for two objects representing exactly the same building with name: "BuildingA" built in Santa Monica on October of 2009.

 Building building1 = new Building(2009,
                                   10,
                                   "Santa Monica",
                                   "USA",
                                   "BuildingA",
                                   false);
 Building building2 = new Building(2009,
                                   10,
                                   "Santa Monica",
                                   "USA",
                                   "BuildingA",
                                   false);

Console.WriteLine(building1.Equals(building2));

And hopefully you would recognize that even though you think these two building objects should be equal the Equals method will return false.


This is because in this case, C# is only comparing the addresses of these objects in memory, and obviously these two addresses should not be equal.


Now, say we do the following


Building building3 = building2;

Console.WriteLine(building3.Equals(building2));

The result will be true. This is because we create a new reference named building3 which only references the same instance in-memory building2.


We can check that ourselves by trying to change one of the attributes like below


building3.Country = "Italy";
Console.WriteLine(building2.Country);

In this case building2 will also have country = "Italy" since we are manipulating the same object.

I wanted to mention this particular case since we will come back to it later when attempting to fix the Equality problem we have above.


We want building1 and building2 to be equal. That's the logical equality I am referring to. Since all the attributes are the same, it is accurate to say, these two objects are representing the same exact building.


Fundamental Properties of Equality

  1. Symmetric. If A.Equals(B) is true, then B.Equals(A) should also be true.

  2. Reflexive. A.Equals(A) should always be true.

  3. Transitivity. If A.Equals(B) is true and B.Equals(C) is true. Then it must be that A.Equals(C) is also true.


So let's take a look at the following implementation. We need to override the Equals method that all objects inherit from base Object class.



    public override bool Equals(object obj)
        {
            // this.Equals(null) should be false
            if (obj == null)
            {
                return false;
            }

            // this == this should be true if the same reference
            if(ReferenceEquals(obj, this))
            {
                return true;
            }

            var other = obj as Building;
            if(other == null)
            {
                return false;
            }

            // we check all properties
            // all should be equal to return true
            return other.MonthBuilt.Equals(this.MonthBuilt) &&
                   other.YearBuilt.Equals(this.YearBuilt) &&
                   other.IsGovernmentBuilding.Equals(this.IsGovernmentBuilding) &&
                   other.BuildingName.Equals(this.BuildingName) &&
                   other.City.Equals(this.City) &&
                   other.Country.Equals(this.Country);
        }
  • As you can see, we first check if the other object is null. A.Equals(null) should always be false.

  • Then we check if other and this are the same reference. If so, we immediately return true.

  • Then, we attempt to cast the object to the same type. If that fails, then we know these are two different objects and should not be equal.

  • At this point, we know the object is of the same type, so we define what we mean by logical equality. In this case we check that all the attributes are equal.


NOTE: You need to be careful if any of the fields was another complex non-primitive object type, the Equals method for that inner object should also be overridden in similar fashion to what we are doing here.

Now, if we go back to the earlier example:


 Building building1 = new Building(2009,
                                   10,
                                   "Santa Monica",
                                   "USA",
                                   "BuildingA",
                                   false);
 Building building2 = new Building(2009,
                                   10,
                                   "Santa Monica",
                                   "USA",
                                   "BuildingA",
                                   false);

Console.WriteLine(building1.Equals(building2));

Now it will print True


It's all great, isn't it?! Not quite.

In fact, if we leave the code as is, we have a fundamental problem.



GetHashCode Should Always Be Overridden when Equals is Overridden


That should be a rule you always keep in mind.

If two objects are defined to be logically equal, then they must have the same hash code.


Hash codes are used in lookups like the ones in HashSet or Dictionary in C#, or HashMaps in Java.

Say you retrieved a Building object and you want to use it as a key in some in-memory hash table-based structure (HashSet, Dictionary, HashMap...etc) or you want to use in in a Set, internally, that object's GetHashCode is called first. As a result an integer is returned. Based on that integer, we know which bucket we can find our object in. This is what makes Hash Tables great and efficient for lookups granted we define a good hash function that can distribute the keys across the buckets.



GetHashCode is a deterministic hash function such that we hash to the same bucket in a hash table.

So in the illustration above, if building1 and building2 are logically equal to Building A , then they should hash to the same bucket index 1.


Two Equal objects must have the same hash value, so we end up looking in the correct bucket to check if we have that object in our in-memory hash-table based structure.


NOTE: The other way around is not necessarily true. Two non-equal objects may have the same hash code. This is because of the Pigeonhole Principle. We can have hash collisions and that's OK. But what must be true is that two objects that are logically equal, must always hash to the same value. Otherwise, you will be spending a lot of time debugging unexpected behavior.

Here is an example that demonstrates the problem:


Building building1 = new Building(2009, 10, "Santa Monica", "USA", "BuildingA", false);

Building building2 = new Building(2009, 10, "Santa Monica", "USA", "BuildingA", false);

HashSet<Building> buildingSet = new HashSet<Building>();
buildingSet.Add(building1);

Console.WriteLine(buildingSet.Contains(building2));

That program will print False when you actually want that code to return True. Since this building is actually in the set.

Since GetHashCode has not been overridden, the default will simply rely on the memory address of the object and will be different for building2 than it is for building1. In other words, the program will be looking for that Building in the wrong hash bucket.



Now if you use Visual Studio like I do, here is what it warns you:


One of the great benefits of IDEs like Visual Studio, is that it looks out of for you and tries to prevent such mistakes.


So let's override the GetHashCode methods.


public override int GetHashCode()
{
            return 11;
}

This very simple GetHashCode satisfied our needs. For any two equal objects it's the same. If you run the same program above, it will return true.

However, this function is not good for another reason. That's because if we use the Building objects in a map or set, they will all hash to the same bucket. That will cause lookup time to become linear as opposed to constant lookup, thus defeating the whole purpose of using a hash table.

You will need to define a better hash function.


So let's define a better one:


public override int GetHashCode()
        {
            int hash = 11;
            hash += this.BuildingName.GetHashCode();
            hash += this.Country.GetHashCode();
            hash += this.City.GetHashCode();

            return hash;
        }

Good hash functions are not the topic of this post, I'll comment on that in another post. But for purposes of the topic here, we now defined a better hash method. You don't need to use all attributes. As long as you can meet the main requirement:

If A.Equals(B) is True then GetHashCode(A) should be Equal to GetHashCode(B).


Now if you try to run the same code above and check if your building exists in the set, it will return true as it should.


Always keep in mind the principles of Equality of objects, and whenever you need to override Equals for an object, you must override GetHashCode correctly as described.


Have fun developing resilient software!


56 views0 comments

Comments


Post: Blog2_Post
bottom of page