original article: https://www.artima.com/lejava/articles/equality.html
In Item 8 of Effective Java, Josh Bloch describes the difficulty of preserving the equals contract when subclassing as a “fundamental problem of equivalence relations in object-oriented languages.” Bloch writes:
There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you are willing to forgo the benefits of object-oriented abstraction.
Chapter 28 of Programming in Scala shows an approach that allows subclasses to extend an instantiable class, add a value component, and nevertheless preserve the equals contract. Although in the technique is described in the book in the context of defining Scala classes, it is also applicable to classes defined in Java. In this article, we present the technique using text adapted from the relevant section of Programming in Scala, but with the code examples translated from Scala into Java.
Common Equality Pitfalls
Class java.lang.Object defines an equals method, which subclasses may override. Unfortunately, it turns out that writing a correct equality method is surprisingly difficult in object-oriented languages. In fact, after studying a large body of Java code, the authors of a 2007 paper concluded that almost all implementations of equals methods are faulty.
This is problematic, because equality is at the basis of many other things. For one, a faulty equality method for a type C might mean that you cannot reliably put an object of type C in a collection. You might have two elements elem1, elem2 of type C that are equal, i.e., “elem1.equals(elem2)” yields true. Nevertheless, with commonly occurring faulty implementations of the equals method, you could still see behavior like the following:
Set<C> hashSet = new java.util.HashSet<C>();hashSet.add(elem1);hashSet.contains(elem2); // returns false!
Here are four common pitfalls that can cause inconsistent behavior when overriding equals:
Defining equals with the wrong signature
Consider adding an equality method to the following class of simple points:
public class Point {private final int x;private final int y;public Point(int x, int y) {this.x = x;this.y = y;}public int getX() {return x;}public int getY() {return y;}// …}A seemingly obvious, but wrong way would be to define it like this:
// An utterly wrong definition of equalspublic boolean equals(Point other) {return (this.getX() other.getX() && this.getY() other.getY());}What’s wrong with this method? At first glance, it seems to work OK:
Point p1 = new Point(1, 2);Point p2 = new Point(1, 2);Point q = new Point(2, 3);System.out.println(p1.equals(p2)); // prints trueSystem.out.println(p1.equals(q)); // prints falseHowever, trouble starts once you start putting points into a collection:
import java.util.HashSet;HashSet<Point> coll = new HashSet<Point>();coll.add(p1);System.out.println(coll.contains(p2)); // prints falseHow can it be that
colldoes not containp2, even thoughp1was added to it, andp1andp2are equal objects? The reason becomes clear in the following interaction, where the precise type of one of the compared points is masked. Definep2aas an alias ofp2, but with typeObjectinstead ofPoint:
Object p2a = p2;Now, were you to repeat the first comparison, but with the alias
p2ainstead ofp2, you would get:
System.out.println(p1.equals(p2a)); // prints falseWhat went wrong? In fact, the version of
equalsgiven previously does not override the standard methodequals, because its type is different. Here is the type of theequalsmethod as it is defined in the root classObject:
public boolean equals(Object other)Because the
equalsmethod inPointtakes aPointinstead of anObjectas an argument, it does not overrideequalsinObject. Instead, it is just an overloaded alternative. Overloading in Java is resolved by the static type of the argument, not the run-time type. So as long as the static type of the argument isPoint, theequalsmethod inPointis called. However, once the static argument is of typeObject, theequalsmethod inObjectis called instead. This method has not been overridden, so it is still implemented by comparing object identity. That’s why the comparison “p1.equals(p2a)” yieldsfalseeven though pointsp1andp2ahave the samexandyvalues. That’s also why thecontainsmethod inHashSetreturnedfalse. Since that method operates on generic sets, it calls the genericequalsmethod inObjectinstead of the overloaded variant inPoint.A better
equalsmethod is the following:
// A better definition, but still not perfect@Override public boolean equals(Object other) {boolean result = false;if (other instanceof Point) {Point that = (Point) other;result = (this.getX() that.getX() && this.getY() that.getY());}return result;}Now
equalshas the correct type. It takes a value of typeObjectas parameter and it yields abooleanresult. The implementation of this method usesinstanceofand a cast. It first tests whether theotherobject is also of typePoint. If it is, it compares the coordinates of the two points and returns the result. Otherwise the result isfalse.
Changing equals without also changing hashCode
If you repeat the comparison of
p1andp2awith the latest definition ofPointdefined previously, you will gettrue, as expected. However, if you repeat theHashSet.containstest, you will probably still getfalse:
Point p1 = new Point(1, 2);Point p2 = new Point(1, 2);HashSet<Point> coll = new HashSet<Point>();coll.add(p1);System.out.println(coll.contains(p2)); // prints false (probably)In fact, this outcome is not 100% certain. You might also get
truefrom the experiment. If you do, you can try with some other points with coordinates 1 and 2. Eventually, you’ll get one which is not contained in the set. What goes wrong here is thatPointredefinedequalswithout also redefininghashCode.Note that the collection in the example above is a
HashSet. This means elements of the collection are put in “hash buckets” determined by their hash code. Thecontainstest first determines a hash bucket to look in and then compares the given elements with all elements in that bucket. Now, the last version of classPointdid redefineequals, but it did not at the same time redefinehashCode. SohashCodeis still what it was in its version in classObject: some transformation of the address of the allocated object. The hash codes ofp1andp2are almost certainly different, even though the fields of both points are the same. Different hash codes mean with high probability different hash buckets in the set. Thecontainstest will look for a matching element in the bucket which corresponds top2’s hash code. In most cases, pointp1will be in another bucket, so it will never be found.p1andp2might also end up by chance in the same hash bucket. In that case the test would returntrue.The problem was that the last implementation of
Pointviolated the contract onhashCodeas defined for classObject:If two objects are equal according to the
equals(Object)method, then calling thehashCodemethod on each of the two objects must produce the same integer result.In fact, it’s well known in Java that
hashCodeandequalsshould always be redefined together. Furthermore,hashCodemay only depend on fields thatequalsdepends on. For thePointclass, the following would be a suitable definition ofhashCode:
public class Point {private final int x;private final int y;public Point(int x, int y) {this.x = x;this.y = y;}public int getX() {return x;}public int getY() {return y;}@Override public boolean equals(Object other) {boolean result = false;if (other instanceof Point) {Point that = (Point) other;result = (this.getX() that.getX() && this.getY() that.getY());}return result;}@Override public int hashCode() { return (41 * (41 + getX()) + getY()); }}This is just one of many possible implementations of
hashCode. Adding the constant41to one integer fieldx, multiplying the result with the prime number41, and adding to that result the other integer fieldygives a reasonable distribution of hash codes at a low cost in running time and code size.Adding
hashCodefixes the problems of equality when defining classes likePoint. However, there are still other trouble spots to watch out for.
Defining equals in terms of mutable fields
Consider the following slight variation of class
Point:
public class Point {private int x; private int y;public Point(int x, int y) {this.x = x;this.y = y;}public int getX() {return x;}public int getY() {return y;}public void setX(int x) { // Problematic this.x = x; } public void setY(int y) { this.y = y; }@Override public boolean equals(Object other) {boolean result = false;if (other instanceof Point) {Point that = (Point) other;result = (this.getX() that.getX() && this.getY() that.getY());}return result;}@Override public int hashCode() {return (41 * (41 + getX()) + getY());}}The only difference is that the fields
xandyare no longer final, and twosetmethods have been added that allow clients to change thexandyvalues. TheequalsandhashCodemethods are now defined in terms of these mutable fields, so their results change when the fields change. This can have strange effects once you put points in collections:
Point p = new Point(1, 2);HashSet<Point> coll = new HashSet<Point>();coll.add(p);System.out.println(coll.contains(p)); // prints trueNow, if you change a field in point
p, does the collection still contain the point? We’ll try it:
p.setX(p.getX() + 1);System.out.println(coll.contains(p)); // prints false (probably)This looks strange. Where did
pgo? More strangeness results if you check whether the iterator of the set containsp:
Iterator<Point> it = coll.iterator();boolean containedP = false;while (it.hasNext()) {Point nextP = it.next();if (nextP.equals(p)) {containedP = true;break;}}System.out.println(containedP); // prints trueSo here’s a set that does not contain
p, yetpis among the elements of the set! What happened, of course, is that after the change to thexfield, the pointpended up in the wrong hash bucket of the setcoll. That is, its original hash bucket no longer corresponded to the new value of its hash code. In a manner of speaking, the pointp“dropped out of sight” in the setcolleven though it still belonged to its elements.The lesson to be drawn from this example is that when
equalsandhashCodedepend on mutable state, it may cause problems for users. If they put such objects into collections, they have to be careful never to modify the depended-on state, and this is tricky. If you need a comparison that takes the current state of an object into account, you should usually name it something else, notequals. Considering the last definition ofPoint, it would have been preferable to omit a redefinition ofhashCodeand to name the comparison methodequalContents, or some other name different fromequals.Pointwould then have inherited the default implementation ofequalsandhashCode. Sopwould have stayed locatable incolleven after the modification to itsxfield.
Failing to define equals as an equivalence relation
The contract of the
equalsmethod inObjectspecifies thatequalsmust implement an equivalence relation on non-nullobjects:
- It is reflexive: for any non-null value
x, the expressionx.equals(x)should returntrue.- It is symmetric: for any non-null values
xandy,x.equals(y)should returntrueif and only ify.equals(x)returnstrue.- It is transitive: for any non-null values
x,y, andz, ifx.equals(y)returnstrueandy.equals(z)returnstrue, thenx.equals(z)should returntrue.- It is consistent: for any non-null values
xandy, multiple invocations ofx.equals(y)should consistently returntrueor consistently returnfalse, provided no information used in equals comparisons on the objects is modified.- For any non-null value
x,x.equals(null)should returnfalse.The definition of
equalsdeveloped so far for classPointsatisfies the contract forequals. However, things become more complicated once subclasses are considered. Say there is a subclassColoredPointofPointthat adds a fieldcolorof typeColor. AssumeColoris defined as an enum:
public enum Color {RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET;}
ColoredPointoverridesequalsto take the newcolorfield into account:
public class ColoredPoint extends Point { // Problem: equals not symmetricprivate final Color color;public ColoredPoint(int x, int y, Color color) {super(x, y);this.color = color;}@Override public boolean equals(Object other) {boolean result = false;if (other instanceof ColoredPoint) {ColoredPoint that = (ColoredPoint) other;result = (this.color.equals(that.color) && super.equals(that));}return result;}}This is what many programmers would likely write. Note that in this case, class
ColoredPointneed not overridehashCode. Because the new definition ofequalsonColoredPointis stricter than the overridden definition inPoint(meaning it equates fewer pairs of objects), the contract forhashCodestays valid. If two colored points are equal, they must have the same coordinates, so their hash codes are guaranteed to be equal as well.Taking the class
ColoredPointby itself, its definition ofequalslooks OK. However, the contract forequalsis broken once points and colored points are mixed. Consider:
Point p = new Point(1, 2);ColoredPoint cp = new ColoredPoint(1, 2, Color.RED);System.out.println(p.equals(cp)); // prints trueSystem.out.println(cp.equals(p)); // prints falseThe comparison “
p equals cp” invokesp’sequalsmethod, which is defined in classPoint. This method only takes into account the coordinates of the two points. Consequently, the comparison yieldstrue. On the other hand, the comparison “cp equals p” invokescp’sequalsmethod, which is defined in classColoredPoint. This method returnsfalse, becausepis not aColoredPoint. So the relation defined byequalsis not symmetric.The loss in symmetry can have unexpected consequences for collections. Here’s an example:
Set<Point> hashSet1 = new java.util.HashSet<Point>();hashSet1.add(p);System.out.println(hashSet1.contains(cp)); // prints falseSet<Point> hashSet2 = new java.util.HashSet<Point>();hashSet2.add(cp);System.out.println(hashSet2.contains(p)); // prints trueSo even though
pandcpare equal, onecontainstest succeeds whereas the other one fails.How can you change the definition of
equalsso that it becomes symmetric? Essentially there are two ways. You can either make the relation more general or more strict. Making it more general means that a pair of two objects,aandb, is taken to be equal if either comparingawithbor comparingbwithayieldstrue. Here’s code that does this:
public class ColoredPoint extends Point { // Problem: equals not transitiveprivate final Color color;public ColoredPoint(int x, int y, Color color) {super(x, y);this.color = color;}@Override public boolean equals(Object other) {boolean result = false;if (other instanceof ColoredPoint) {ColoredPoint that = (ColoredPoint) other;result = (this.color.equals(that.color) && super.equals(that));}else if (other instanceof Point) {Point that = (Point) other;result = that.equals(this);}return result;}}The new definition of
equalsinColoredPointchecks one more case than the old one: If theotherobject is aPointbut not aColoredPoint, the method forwards to theequalsmethod ofPoint. This has the desired effect of makingequalssymmetric. Now, both “cp.equals(p)” and “p.equals(cp)” result intrue. However, the contract forequalsis still broken. Now the problem is that the new relation is no longer transitive! Here’s a sequence of statements that demonstrates this. Define a point and two colored points of different colors, all at the same position:
ColoredPoint redP = new ColoredPoint(1, 2, Color.RED);ColoredPoint blueP = new ColoredPoint(1, 2, Color.BLUE);Taken individually,
redpis equal topandpis equal tobluep:
System.out.println(redP.equals(p)); // prints trueSystem.out.println(p.equals(blueP)); // prints trueHowever, comparing
redPandbluePyieldsfalse:
System.out.println(redP.equals(blueP)); // prints falseHence, the transitivity clause of
equals’s contract is violated.Making the
equalsrelation more general seems to lead to a dead end. We’ll try to make it stricter instead. One way to makeequalsstricter is to always treat objects of different classes as different. That could be achieved by modifying theequalsmethods in classesPointandColoredPoint. In classPoint, you could add an extra comparison that checks whether the run-time class of the otherPointis exactly the same as thisPoint’s class, as follows:
// A technically valid, but unsatisfying, equals methodpublic class Point {private final int x;private final int y;public Point(int x, int y) {this.x = x;this.y = y;}public int getX() {return x;}public int getY() {return y;}@Override public boolean equals(Object other) {boolean result = false;if (other instanceof Point) {Point that = (Point) other;result = (this.getX() that.getX() && this.getY() that.getY()&& this.getClass().equals(that.getClass()));}return result;}@Override public int hashCode() {return (41 * (41 + getX()) + getY());}}You can then revert class
ColoredPoint’s implementation back to the version that previously had violated the symmetry requirement:4
public class ColoredPoint extends Point { // No longer violates symmetry requirementprivate final Color color;public ColoredPoint(int x, int y, Color color) {super(x, y);this.color = color;}@Override public boolean equals(Object other) {boolean result = false;if (other instanceof ColoredPoint) {ColoredPoint that = (ColoredPoint) other;result = (this.color.equals(that.color) && super.equals(that));}return result;}}Here, an instance of class
Pointis considered to be equal to some other instance of the same class only if the objects have the same coordinates and they have the same run-time class, meaning.getClass()on either object returns the same value. The new definitions satisfy symmetry and transitivity because now every comparison between objects of different classes yieldsfalse. So a colored point can never be equal to a point. This convention looks reasonable, but one could argue that the new definition is too strict.Consider the following slightly roundabout way to define a point at coordinates
(1, 2):
Point pAnon = new Point(1, 1) {@Override public int getY() {return 2;}};Is
pAnonequal top? The answer is no because thejava.lang.Classobjects associated withpandpAnonare different. Forpit isPoint, whereas forpAnonit is an anonymous subclass ofPoint. But clearly,pAnonis just another point at coordinates(1, 2). It does not seem reasonable to treat it as being different fromp.The
canEqualmethodSo it seems we are stuck. Is there a sane way to redefine equality on several levels of the class hierarchy while keeping its contract? In fact, there is such a way, but it requires one more method to redefine together with
equalsandhashCode. The idea is that as soon as a class redefinesequals(andhashCode), it should also explicitly state that objects of this class are never equal to objects of some superclass that implement a different equality method. This is achieved by adding a methodcanEqualto every class that redefinesequals. Here’s the method’s signature:
public boolean canEqual(Object other)The method should return
trueif theotherobject is an instance of the class in whichcanEqualis (re)defined,falseotherwise. It is called fromequalsto make sure that the objects are comparable both ways. Here is a new (and final) implementation of classPointalong these lines:
public class Point {private final int x;private final int y;public Point(int x, int y) {this.x = x;this.y = y;}public int getX() {return x;}public int getY() {return y;}@Override public boolean equals(Object other) {boolean result = false;if (other instanceof Point) {Point that = (Point) other;result = (that.canEqual(this) && this.getX() that.getX() && this.getY() that.getY());}return result;}@Override public int hashCode() {return (41 * (41 + getX()) + getY());}public boolean canEqual(Object other) { return (other instanceof Point); }}The
equalsmethod in this version of classPointcontains the additional requirement that the other object can equal this one, as determined by thecanEqualmethod. The implementation ofcanEqualinPointstates that all instances ofPointcan be equal.Here’s the corresponding implementation of
ColoredPoint:
public class ColoredPoint extends Point { // No longer violates symmetry requirementprivate final Color color;public ColoredPoint(int x, int y, Color color) {super(x, y);this.color = color;}@Override public boolean equals(Object other) {boolean result = false;if (other instanceof ColoredPoint) {ColoredPoint that = (ColoredPoint) other;result = (that.canEqual(this) && this.color.equals(that.color) && super.equals(that));}return result;}@Override public int hashCode() {return (41 * super.hashCode() + color.hashCode());}@Override public boolean canEqual(Object other) { return (other instanceof ColoredPoint); }}It can be shown that the new definition of
PointandColoredPointkeeps the contract ofequals. Equality is symmetric and transitive. Comparing aPointto aColoredPointalways yieldsfalse. Indeed, for any pointpand colored pointcp, “p.equals(cp)” will returnfalsebecause “cp.canEqual(p)” will returnfalse. The reverse comparison, “cp.equals(p)”, will also returnfalse, becausepis not aColoredPoint, so the firstinstanceofcheck in the body ofequalsinColoredPointwill fail.On the other hand, instances of different subclasses of
Pointcan be equal, as long as none of the classes redefines the equality method. For instance, with the new class definitions, the comparison ofpandpAnonwould yield true. Here are some examples:
Point p = new Point(1, 2);ColoredPoint cp = new ColoredPoint(1, 2, Color.INDIGO);Point pAnon = new Point(1, 1) {@Override public int getY() {return 2;}};Set<Point> coll = new java.util.HashSet<Point>();coll.add(p);System.out.println(coll.contains(p)); // prints trueSystem.out.println(coll.contains(cp)); // prints falseSystem.out.println(coll.contains(pAnon)); // prints trueThese examples demonstrate that if a superclass
equalsimplementation defines and callscanEqual, then programmers who implement subclasses can decide whether or not their subclasses may be equal to instances of the superclass. BecauseColoredPointoverridescanEqual, for example, a colored point may never be equal to a plain-old point. But because the anonymous subclass referenced frompAnondoes not overridecanEqual, its instance can be equal to aPointinstance.One potential criticism of the
canEqualapproach is that it violates the Liskov Substitution Principle (LSP). For example, the technique of implementingequalsby comparing run-time classes, which led to the inability to define a subclass whose instances can equal instances of the superclass, has been described as a violation of the LSP.5 The reasoning is that the LSP states you should be able to use (substitute) a subclass instance where a superclass instance is required. In the previous example, however, “coll.contains(cp)” returnedfalseeven thoughcp’sxandyvalues matched those of the point in the collection. Thus it may seem like a violation of the LSP, because you can’t use aColoredPointhere where aPointis expected. We believe this is the wrong interpretation, though, because the LSP doesn’t require that a subclass behaves identically to its superclass, just that it behaves in a way that fulfills the contract of its superclass.The problem with writing an
equalsmethod that compares run-time classes is not that it violates the LSP, but that it doesn’t give you a way to create a subclass whose instances can equal superclass instances. For example, had we used the run-time class technique in the previous example, “coll.contains(pAnon)” would have returnedfalse, and that’s not what we wanted. By contrast, we really did want “coll.contains(cp)” to returnfalse, because by overridingequalsinColoredPoint, we were basically saying that an indigo-colored point at coordinates (1, 2) is not the same thing as an uncolored point at (1, 2). Thus, in the previous example we were able to pass two differentPointsubclass instances to the collection’scontainsmethod, and we got back two different answers, both correct.