Skip to main content

GetHashCode Hands-On Session

The following is a hands-on post meant to demonstrate how GetHashCode() and Equals() methods are used by .NET Framework under the hood. For the sake of simplicity I will refer to the popular hashed-base Dictionary type, although any other hash based structure will follow a similar behavior, if not the same one.

After understanding this post you should be able to spot potential problematic behaviors and resolve them, prevent creation of unreachable items in dictionaries and improve CRUD actions performance on hash based structures.

The Theory

GetHashCode() is used to create a unique integer identifier for objects/structs. The hashcode can be used for two purposes:

  • Programmatically, by developers, to distinguish objects/structs form each other (NOTE: Not recommended when the default .NET implementation is used, as it's not guaranteed to preserve the same hash between .NET versions and platforms)
  • Internally, by .NET Framework, when using the object/struct as a key in a hashed based list, i.e. Dictionary. This scenario is the main subject of this article.

Microsoft has supplied both Reference and Value types with two default GetHashCode() implementations. The implementation is designed to provide a generic "good-enough" solution with almost no knowledge of our particular types at runtime. As result the default implementation of GetHashCode() works, but is not optimized. (The default Value.GetHashCode() has an additional issue, discussed later in this post).

Add and Search Mechanism

When using an object/struct as a key in a Dictionary, .NET implementation internally calls the given object/struct's GetHashCode() method and calculates an index from it, based on the length of the list.

Although implementation can vary between .NET versions, the common implementation is similar to the following:

int hash = GetHashCode() % dictionary.Length;
Don't worry, we will review the default implementation later-on.

Hashbased structures are composed by buckets that are used to 'hold' the supplied objects. When trying to insert new items in a Dictionary .NET uses the calculated index and tries to insert the item in that specific index position.

If the calculated index is still not in use (the bucket is empty), the item is placed smoothly in the bucket (O(1) algorithm).

In case the calculated index already 'holds' an object(s), meaning that the bucket is already in use, .NET will still try to add the object to the bucket, but only if the identical key is not already used by that specific bucket. This scenario is called "collision"(O(n) algorithm).

Hash Table Collision

In order to check the existence of identical keys in a bucket .NET compares each existing key in the bucket with the new candidate item by calling the Equals() method on each of the existing items:

bool exist = alreadyInUseKey1.Equals(newKey1);  //* Loops through the entire bucket items list 

If .NET encounters an equivalence between two keys (exist = true) an 'An item with the same key has already been added' exception will be thrown. Otherwise the item will be added with the provided key.

NOTE: Typically, the domain of a hash function (the set of possible keys) is larger than its range (the number of different table indexes), and so it will map several different keys to the same index (called collision).

When searching an item based on its key object, .NET internally retrieves the key's hashcode, recalculates the index and instructs the CPU to retrieve the object from the specific bucket at the index position.

Optimal Implementation

In order to distinguish objects from each other, we need to make use of all, or parts of, its internal values (properties or fields). There are three common golden rules we must follow in order to maximize use of hashed lists:

  1. If two objects are equal (as defined by Equals() method), they must generate the same hash value, Otherwise, their hashcodes can’t be used to find objects in their correspondent buckets.
    The contrary is not required - Multiple non-Equal objects can have the same hashcode (collisions). This is the reason why we need to override Equals() when GetHashCode() has been overridden and vice versa, we need them to be synchronized.
  2. For any object A, A.GetHashCode() must be an instance invariant. No matter what methods are called upon A, A.GetHashCode() must always return the same value. That ensures that an object placed in a bucket will always remain reachable.
  3. To reduce the frequency of hashcode collisions, the hash function should generate a uniform distribution among all integers (32/64 bit) based on all, or part of, the object/struct fields. That’s how you achieve efficiency in a hash-based container.

    Wikipedia: In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution whereby a finite number of values are equally likely to be observed; every one of n values has equal probability 1/n. Another way of saying "discrete uniform distribution" would be "a known, finite number of outcomes equally likely to happen". A simple example of the discrete uniform distribution is throwing a fair die. The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the probability of a given score is 1/6. If two dice are thrown and their values added, the resulting distribution is no longer uniform since not all sums have equal probability.

In order to use the large number produced by the GetHashCode() as an index, we need to reduce it to the length of the dictionary while preserving the good distribution gained by our GetHashCode().

For that reason, as explained previously, .NET internally calculates the index by modeling it against the length of the list. By the way, .NET preserves the hashcode in order to use it in case the Dictionary capacity is increased at runtime. In this case .NET recalculates all items indexed with the new Length of the dictionary and replaces the items position accordingly.

A common and successful algorithm is to XOR all the return values from GetHashCode() on all fields in a type. If your type contains some mutable fields, exclude those fields from the calculations. At this point it should be clear that a poor hashcode algorithm can cause performance issues and even unreachable items (see samples in "Hands On" section below).

Default .NET Implementation

As mentioned before, mediocre GethashCode() and Equals() implementations where supplied by default by Microsoft:

System.Object doesn't have to follow rule 3, because it uses a simple sequential integer. So when you create reference types that are meant to be hash keys, you should override GetHashCode() to get a better distribution of the hash values across all integers for your specific type.

System.ValueType overrides Objects' implementations and performs a hash calculation based on the object's first Property. Rule 2 and 3 depend on the first property of an object. It must be well distributable and immutable.

NOTE: I have taken the description above, about System.ValueType implementation, from the book "Effective C#". Several tests later, I now understand that it's not always the case. .NET apparently uses a more complex algorithm. In this article we will follow "Effective C#"'s description.

Hands On

The following is an effective illustration demonstrating how small changes can lead to big differences when talking about hash based structures:

Defaul Value Implelemntation

Given a custom value struct-key, where the first field (id) of the key is always different.

    
    struct MyKey
    {
        private int id;
        private string msg;
        public MyKey(int id)
        {
            this.id = id;
            this.msg = "Hello";     
        }
    }

    private static void AddDic()
    {
        Dictionary <MyKey,int> list = new Dictionary<MyKey,int>();            
        for (int i = 0; i < 100; i++)
            list.Add( new MyKey(i), i);

    } 

The code above will take ~8 seconds on my 4 core 8MB RAM machine.

Let's say that time comes and your college made a small refactor:

    
    struct MyKey
    {
        private string msg;//* Now first field in the struct
        private int id;
        public MyKey(int id)
        {
            this.id = id;
            this.msg = "Hello";     
        }
    }

Even though she just changed the order of the fields, now the code will take ~30 seconds, because the first field of the struct is fixed and all the items are inserted in the same bucket.

Custom Equals & GetHashCode

The same is true with custom overridden methods. We need to make sure the hashcode is well distributed, otherwise we will suffer from the same behavior.

Let's consider a similar object-key implementation:


    
    struct MyKey
    {
        private int id;
        public MyKey(int id)
        {
            this.id = id;
        }
              
        public override bool Equals(object obj)
        {
            int int1 = ((MyKey)obj).id;
            bool equ = int1 == id;
            return equ;
        }

        public override int GetHashCode()
        {
            int hash = 1;//* Always same value
            return hash;
        }
    }

The first sample makes use of the above object-key to generate an identical hashcode for all instances.

    
        private static void AddDic()
        {
            Dictionary <MyKey , int> list = new Dictionary<MyKey, int>();

            double beforems = DateTime.Now.TimeOfDay.TotalMilliseconds;
            Console.WriteLine("Befoare Insert: " + beforems);
            for ( int i = 0; i < 100; i++)
            {
                list.Add(new MyKey(i), i);
                Console.WriteLine("----");
            }
            double afterms = DateTime.Now.TimeOfDay.TotalMilliseconds;
            Console.WriteLine("After Insert: " + afterms);

            Console.WriteLine("Total: " + (afterms - beforems));
          
            Console.ReadKey();
        } 

Adding new items into the Dictionary is very expensive, as mentioned before. When an identical hashcode is encountered, .NET tries to determine equality for all the key objects by looping throughout all the items in the specific bucket. This innocent loop of 100 items suffers from extremely low performance just because the hashcode in the custom key is always the same.

Output:

Equals - Internal Object ID:15 External Object ID: 16
Equals - Internal Object ID:14 External Object ID: 16
Equals - Internal Object ID:13 External Object ID: 16
Equals - Internal Object ID:12 External Object ID: 16
Equals - Internal Object ID:11 External Object ID: 16
Equals - Internal Object ID:10 External Object ID: 16
Equals - Internal Object ID:9 External Object ID: 16
Equals - Internal Object ID:8 External Object ID: 16
Equals - Internal Object ID:7 External Object ID: 16
Equals - Internal Object ID:6 External Object ID: 16
Equals - Internal Object ID:5 External Object ID: 16
Equals - Internal Object ID:4 External Object ID: 16
Equals - Internal Object ID:3 External Object ID: 16
Equals - Internal Object ID:2 External Object ID: 16
Equals - Internal Object ID:1 External Object ID: 16
Equals - Internal Object ID:0 External Object ID: 16
----
GetHashCode - Internal Object ID:17 HashCode :1
Equals - Internal Object ID:16 External Object ID: 17
Equals - Internal Object ID:15 External Object ID: 17
Equals - Internal Object ID:14 External Object ID: 17
Equals - Internal Object ID:13 External Object ID: 17
Equals - Internal Object ID:12 External Object ID: 17
Equals - Internal Object ID:11 External Object ID: 17
Equals - Internal Object ID:10 External Object ID: 17
Equals - Internal Object ID:9 External Object ID: 17
Equals - Internal Object ID:8 External Object ID: 17
Equals - Internal Object ID:7 External Object ID: 17
Equals - Internal Object ID:6 External Object ID: 17
Equals - Internal Object ID:5 External Object ID: 17
Equals - Internal Object ID:4 External Object ID: 17
Equals - Internal Object ID:3 External Object ID: 17
Equals - Internal Object ID:2 External Object ID: 17
Equals - Internal Object ID:1 External Object ID: 17
Equals - Internal Object ID:0 External Object ID: 17
----
GetHashCode - Internal Object ID:18 HashCode :1
Equals - Internal Object ID:17 External Object ID: 18
Equals - Internal Object ID:16 External Object ID: 18
Equals - Internal Object ID:15 External Object ID: 18
Equals - Internal Object ID:14 External Object ID: 18
Equals - Internal Object ID:13 External Object ID: 18
Equals - Internal Object ID:12 External Object ID: 18
Equals - Internal Object ID:11 External Object ID: 18
Equals - Internal Object ID:10 External Object ID: 18
Equals - Internal Object ID:9 External Object ID: 18
Equals - Internal Object ID:8 External Object ID: 18
Equals - Internal Object ID:7 External Object ID: 18
Equals - Internal Object ID:6 External Object ID: 18
Equals - Internal Object ID:5 External Object ID: 18
Equals - Internal Object ID:4 External Object ID: 18
Equals - Internal Object ID:3 External Object ID: 18
Equals - Internal Object ID:2 External Object ID: 18
Equals - Internal Object ID:1 External Object ID: 18
Equals - Internal Object ID:0 External Object ID: 18
----

We can see that every new added object (17 and 18) is tested against each existing item in the dictionary.

By changing our implementation (although not the optimal one) to return a distinct value for each item, we will dramatically increase the overall performance.

  
    struct MyKey
    {
        private int id;
        public MyKey(int id)
        {
            this.id = id;
        }
              
        public override bool Equals(object obj)
        {
            int int1 = ((MyKey)obj).id;
            bool equ = int1 == id;
            return equ;
        }

        public override int GetHashCode()
        {
            int hash = id;//* Always different
            return hash;
        }
    }

An additional common mistake can occurs when our GetHashCode() implementation is based on a mutable field, for example, msg field. In this case we can make items become unreachable if the field gets modified at runtime:

  
         private static void ChangeDic()
        {
            Dictionary<MyKeyStruct , int> list = new Dictionary<MyKeyStruct , int>();

            MyKeyStruct one = new MyKeyStruct(1);
            MyKeyStruct oneTwo = new MyKeyStruct(2);
            MyKeyStruct oneThree = new MyKeyStruct(3);
            MyKeyStruct oneFour = new MyKeyStruct(4);
            MyKeyStruct oneFive = new MyKeyStruct(5);
            MyKeyStruct oneSix = new MyKeyStruct(6);

            list.Add(one, 1);
            Console.WriteLine( "----" );
            list.Add(oneTwo, 2);
            Console.WriteLine( "----" );
            list.Add(oneThree, 3);
            Console.WriteLine( "----" );
            list.Add(oneFour, 4);
            Console.WriteLine( "----" );
            list.Add(oneFive, 5);
            Console.WriteLine( "----" );
            list.Add(oneSix, 6);

            one.msg = "x" ;//* Change msg value

            var item = list[one];//* Exception: The given key was not present in the dictionary.

            Console.ReadKey();
        } 

The item '1' is lost somewhere in the hash map because after being stored in a specific bucket, we changed the msg filled, causing GetHashCode() method to produce a new hash, and pointing to a different bucket. Next time 'one' key is used, a new hashcode will lead to a different bucket and '1' item will never be found.

Summary

The default behavior, Object.GetHashCode() works correctly for reference types.
ValueType.GetHashCode() works only if the first field in your struct is immutable.

Both of them do not necessarily generate an efficient distribution hashcode, and their implementation must be based on immutable values.

ValueType.GetHashCode() generates an efficient hash code only when the first field in your struct contains values across a meaningful subset of its inputs.

Comments

isabella said…
I’m really impressed with your blog article, such great & useful knowledge you mentioned here..
Dot Net Training in Chennai
Android Training in Chennai
Rajesh said…
Great post.Thanks for one marvelous posting! I enjoyed reading it;The information was very useful.Keep the good work going on!!

ETL Testing training in chennai| SAP MM training in chennai | Informatica training in chennai
Praylin S said…
This comment has been removed by a blog administrator.
monu said…
Thank you for excellent article.Great information for new guy like antimalware service executable
Chris Hemsworth said…
This comment has been removed by a blog administrator.
w3webschool said…
Such a wonderful article and I feel that it is best to write more on this topic. Thank you so much because i learn a lot of ideas about it. Keep posting...
Digital Marketing Course In Kolkata
Web Design Course In Kolkata
SEO Course In Kolkata
Keerthi SK said…
I was following your blog regularly and this one is very interesting and knowledge attaining. Great effort ahead. you can also reach us for web development company in chennai website design company in chennai
Ali jan said…
this Lightning delivers a new design language which promises a consumer-like experience across every device, browser and operating system. This is in turn useful in making Salesforce a responsive platform. Salesforce training in Chennai
Babu said…
I enjoyed reading your article. Thanks for taking the time to post such a valuable article.
seo content writing tips
language similar to english
salesforce basics
star certification
hacking books
This comment has been removed by the author.
This comment has been removed by the author.
Banumadhu said…
Useful article which was very helpful. also interesting and contains good information.
to know about python training course , use the below link.

Python Training in chennai

Python Course in chennai
rajmohan1140 said…
I learn new information from your article , you are doing a great job . Keep it up

Java Training in Chennai

Java Course in Chennai
technology said…
Whichever way you select your trial group, it is important to choose the people who already embody that high-achieving spirit, and who will achieve even greater results if they are given additional training, focus and direction. Salesforce interview questions
Nice to be visiting your blog again, it has been months for me. Well this article that i've been waited for so long. I need this article to complete my assignment in the college, and it has same topic with your article. Thanks, great share.


เว็บแทงบอล
ufabet
ufa
พวงหรีด
โควิด
บาคาร่า
This article is increasing the interest to learn more about this topic. Continue the sharing your new updates, regularly for my future.


คาสิโนออนไลน์
ufabet
ufa
เว็บบอล
relx
Movie-watching websites that are more than movie-watching websites Because we are the number 1 free movie site in Thailand for a long time, including new movies, Thai movies, Western movies, Asian movies, we have all kinds of ways for you Including new series Full of all stories without interstitial ads to keep annoying anymore. One place sa-movie.com.

Android and IOS operating systems. Watch online movies, Thai movies, Western movies, Asian movies, Cartoon movies, Netflix Movie, Action Movies, Comedy Movies, Crime Movies, Drama Movies, Horror Movies, Adventure Movies, Crash Movies and still have many new movies to watch. You can watch for free anytime, anywhere 24 hours a day at see4k.com.


GangManga read manga, read manga, read manga online for free, fast loading, clear images in HD quality, all titles, anywhere, anytime, on mobile, tablet, computer. Android and IOS operating systems. Read top comics, action dramas, comedy, adventure, horror and manga. New coming every day to watch many more. Can be read for free anytime anywhere 24 hours a day at gangmanga.com..

It is no secret that football is among the most popular and widely watched sports. Everybody who likes football tries to find the best platform for free soccer streaming. So, what are the best free sports streaming sites? We are going to answer this question. On this page, you can find a detailed overview of the most widespread soccer streaming websites. Keep on reading and make the best choice for you live24th.me.
Maradona Jons said…
With special privileges and services, UEFA BET offers opportunities for small capitalists. Together ufa with the best websites that collect the most games With a minimum deposit starting from just 100 baht, you are ready to enjoy the fun with a complete range of betting that is available within the website

ufabet , our one another option We are a direct website, not through an agent, where customers can have great confidence without deception The best of online betting sites is that our Ufa will give you the best price

หาคุณกำลังหาเกมส์ออนไลน์ที่สามารถสร้างรายได้ให้กับคุณ เรามีเกมส์แนะนำ เกมยิงปลา รูปแบบใหม่เล่นง่ายบนมือถือ คาสิโนออนไลน์ บนคอม เล่นได้ทุกอุปกรณ์รองรับทุกเครื่องมือ มีให้เลือกเล่นหลายเกมส์ เล่นได้ทั่วโลกเพราะนี้คือเกมส์ออนไลน์แบบใหม่ เกมยิงปลา

อีกทั้งเรายังให้บริการ เกมสล็อต ยิงปลา แทงบอลออนไลน์ รองรับทุกการใช้งานในอุปกรณ์ต่าง ๆ HTML5 คอมพิวเตอร์ แท็บเล็ต สมาทโฟน คาสิโนออนไลน์ และมือถือทุกรุ่น เล่นได้ตลอด 24ชม. ไม่ต้อง Downloads เกมส์ให้ยุ่งยาก ด้วยระบบที่เสถียรที่สุดในประเทศไทย
Aishwariya said…
wonderful article contains lot of valuable information. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
AWS certification course in Chennai
Maradona Jons said…
อีกทั้งเรายังให้บริการ เกมสล็อต ยิงปลา แทงบอลออนไลน์ รองรับทุกการใช้งานในอุปกรณ์ต่าง ๆ HTML5 คอมพิวเตอร์ แท็บเล็ต สมาทโฟน คาสิโนออนไลน์ และมือถือทุกรุ่น เล่นได้ตลอด 24ชม. ไม่ต้อง Downloads เกมส์ให้ยุ่งยาก ด้วยระบบที่เสถียรที่สุดในประเทศไทย
thomas said…
สตีฟ บรูซ ได้รับรางวัลผู้จัดการทีมยอดเยี่ยม พรีเมียร์ลีก ประจำเดือนเมษายน ผู้จัดการทีม นิวคาสเซิล ยูไนเต็ด นำทีมของเขาเก็บไปได้ 8 แต้มจากการแข่งขันทั้ง 4 นัดโดยไม่เลยแม้แต่เกมเดียวช่วยให้ทีมของเขาการันตีการอยู่รอดบนลีกสูงสุดไปต่อได้อีกฤดูกาล เดอะ แม็กพายส์ ตามตีเสมอ...ufa
Marty Sockolov said…
Baccarat is money making and it's spectacular availability. The best In your case it's found that you will find quite interesting options. And that is thought to be a thing that's rather varied And it's very something that's rather prepared to strike with Probably the most good, as well, is a genuinely good option. Moreover, it's a truly interesting alternative. It's the simplest way which could generate profits. Superbly ready The number of best-earning baccarat will be the accessibility of making by far the most money. As much as achievable is very ideal for you An alternative which could be guaranteed. To a wide variety of supply and performance And find out excellent benefits also..บาคาร่า
ufa
ufabet
แทงบอล
แทงบอล
แทงบอล
Maradona Jons said…
pgslot ซึ่งเกมคาสิโนออนไลน์เกมนี้เป็นเกมที่เรียกว่าเกม สล็อตเอ็กซ์โอ คุณรู้จักเกมส์เอ็กซ์โอหรือไม่ 90% ต้องรู้จักเกมส์เอ็กซ์โออย่างแน่นอนเพราะในตอนนี้เด็กนั้นเราทุกคนมักที่จะเอาก็ได้ขึ้นมา สล็อต เล่นเกมส์เอ็กซ์โอกับเพื่อนเพื่อนแล้วคุณรู้หรือไม่ว่าในปัจจุบันนี้เกมส์เอ็กซ์โอนั้นกลายมาเป็นเกมซะลอสออนไลน์ที่ให้บริการด้วยเว็บคาสิโนออนไลน์คุณสามารถเดิมพันเกมส์เอ็กซ์โอกับเว็บคาสิโนออนไลน์ได้โดยที่จะทำให้คุณนั้นสามารถสร้างกำไรจากการเล่นเกมส์เดิมพันออนไลน์ได้เราแนะนำเกมส์ชนิดนี้ให้คุณได้รู้จักก็เพราะว่าเชื่อว่าทุก

Marty Sockolov said…
Probably the most genuine football betting UFABET that's beyond description Find fun, excitement and excitement with slot video games, hundred totally free recognition, quick withdrawal. If you desire to have fun slots for money No need to deposit a great deal, no minimum, no need to share, squander moment for the reason that UFABET is in fact reduced, given seriously, many great offers are waiting for you. Prepared to ensure pleasurable, regardless of whether it's Joker SlotXo fruit slot, we are able to phone it an internet slot website for you personally especially. Ready to have fun Like the support staff which is going to facilitate slot formulas as well as techniques of actively playing So you will be certain that each minute of fun and pleasure We will be there for one to give your customers the best appearance as well as fulfillment.
บาคาร่า
สล็อต
ufa
แทงบอล
Joseph Vijay said…
I found your blog on Google and read a few of your other posts too, it is very informative and also influencing me to read more of yours. You have done an exceptional job by adding your hardwork and dedication to it. I am a big fan of art and craft. I just added you to my bookmarks and recommending your blog to my social profiles too. Just like the other budding bloggers i have started my blog, You can support me by visiting my site for more paper airplane related information and knowledge, Keep up the great work Look forward to reading more from you in the future. Thanks in anticipation.
how to make a paper airplane that flies far and straight | classic dart paper airplane | how to make a good paper | nakamura lock paper airplane instructions | best paper airplane design for distance and accuracy | best paper airplane design for distance and speed | Boomerang paper airplane instructions | Zazoom internet | Eagle paper airplane | how to make a paper airplane that flies far and straight step by step | 4 aerodynamic forces | Paper airplane templates for distance | the eagle paper airplane | best paper plane in the world | How to make a successful paper airplane

The Best

Design API for Multiple Different Clients

Today I want to talk about common design challenges related to architecture of robust APIs, designed to be consumed by multiple clients with different needs. Our use case is the following: We need to build a N-Tier Web REST/SOAP API that is supposed to read/write data from a DB, perform some processing on that data and expose those methods to our API consumers. In addition we have multiple different API clients each with different needs, meaning we can't just expose a rigid set of functions with a defined group of DTOs (Data Transfer Objects). DTO vs POCO Before start diving I want to explain shortly the difference between these two controversial concepts. DTO Objects that are designed to transfer data between edges (i.e. between processes, functions, server & clients etc'). Typically DTOs will contain only simple properties with no behavior. POCO Objects that are designed to reflect the internal business data model. For example if you have an eCommerce platform

Closures in C# vs JavaScript -
Same But Different

Closure in a Nutshell Closures are a Software phenomenon which exist in several languages, in which methods declared inside other methods (nested methods), capture variables declared inside the outer methods. This behavior makes captured variables available even after the outer method's scope has vanished. The following pseudo-code demonstrates the simplest sample: Main() //* Program starts from here { Closures(); } AgeCalculator() { int myAge = 30; return() => { //* Returns the correct answer although AgeCalculator method Scope should have ordinarily disappear return myAge++; }; } Closures() { Func ageCalculator = AgeCalculator(); //* At this point AgeCalculator scopeid cleared, but the captured values keeps to live Log(ageCalculator()); //* Result: 30 Log(ageCalculator()); //* Result: 31 } JavaScript and C# are two languages that suppo