Notes on Equality Comparison in Java

TL;DR

🞿 This is my personal notes which I made while studying Java, so the content below should not be taken seriously as technical reference.

🞿 I will be spending 70% of the note building up the basic understanding of the Java memory model... to explain a difference that could be summarized in a single sentence:

== checks same value for primitives and same object for references; equals() checks whatever the class defines as equality.

The Context #

In Java, we have two methods for equality comparison of values: equals() and the operand ==.

So what's the difference between using == and equals() in Java? Turns out there's a lot.

The Mechanics #

To understand how primitive values and objects are being asserted for equality in Java, I think we must first roughly understand how they are being managed memory-wise.

For context, it is known that Java is a statically and strongly typed language. Static here means that every variable and expression have a type that is known at compile time, and strongly here means that the type of the value does not change. See Chapter 4 of the Java Spec for more detail on the type system.

Java Type System #

Java is object-oriented, but not everything is an object since there are primitive values. In other words, there are two kinds of types in Java: Primitive types and Reference types 3.

It is clear that a primitive value such as an integer 5 is simply represented as ...000101 in binary format (bits) in memory.

However, in Java, according to the JVM Spec section 2.3 only guarantees the semantics (logic/math) are 32-bit for int values. The specification grants JVM implementations such as HotSpot OpenJ9, etc. freedom in how they map the abstract machine to physical hardware.

Meanwhile, the reference type (class, array, interface, String, etc.) values are really just address lines pointing to somewhere in the heap space, hence the name reference.

Most simply, reference type values are values that the JVM can use to find the actual object during runtime.

Person p = new Person();
Person p = new Person();
Stack (main frame)
Stack (main frame)
Heap


Heap...
Person object @0xA1B2
[JVM header/metadata]
name = null
age = 0
Person object @0xA1B2...
p: Person
value = 0xA1B2
p: Person...
ngnhng.github.io
ngnhng.github.io

Less simply, a reference is a value that identifies an object; how it is represented (pointer, handle (pointers to pointers), compressed pointer, etc.) is JVM dependent.

In the HotSpot implementation, a reference is a bit pattern representing a virtual memory address (a pointer, or a compressed offset decoded into a pointer) that indicates the starting byte of the JVM Object Header on the Heap.

Person p = new Person();
Person p = new Person();
Stack (main frame)
Stack (main frame)
Heap Address Space (Virtual Memory)


Heap Address Space (Virtual Memory)...
[Header: Mark Word (8 bytes)]
(Locks, HashCode, GC Age bits)
-------------------------------------
[Header: Klass Pointer (4 bytes)]
(Points to Person.class method code)
-------------------------------------
[Data: Field `age` int]
...0000000000000000000.....
[Header: Mark Word (8 bytes)]...
p: Person
value = 0xA1B2
p: Person...
ngnhng.github.io
ngnhng.github.io
...
Address 0xA1B2 (The exact start)
......
...
...

Q&A

Q: Is Java's reference a pointer?

A: Short answer, no.

There are multiple JVM implementations, hence it is possible that in one implementation, GC can move objects (compacting collectors, generational collectors) to reduce fragmentation and improve locality. So we can not expect the reference to be the same across time.

This means Java references are GC-aware and should not be addresses/pointers that the programmers can directly control.

Q&A

Q: What's in the JVM header/metadata?

A: See JVM.

Pass-by-value #

Now that we've know something about how Java manages its objects, we will now learn about how Java is actually pass-by-value and not pass-by-reference.

Person p = new Person();
f(p);
Person p = new Person();...
Stack
Stack
Heap


Heap...
Person object @0xA1B2
[JVM header/metadata]
name = null
age = 0
Person object @0xA1B2...
parameter
(Is an alias)
parameter...
ngnhng.github.io
ngnhng.github.io
f frame
f frame
p: Person
value = 0xA1B2
p: Person...
main frame
main frame

Person p = new Person();
f(p);
Person p = new Person();...
Stack
Stack
Heap


Heap...
Person object @0xA1B2
[JVM header/metadata]
name = null
age = 0
Person object @0xA1B2...
p: Person
value = 0xA1B2
(A copy of the address bits)
p: Person...
ngnhng.github.io
ngnhng.github.io
f frame
f frame
p: Person
value = 0xA1B2
p: Person...
main frame
main frame

So in short, Java is pass-by-value, and the values passed are reference values for reference types.

Conclusion on Equality Comparison #

Now that we have cleared the concepts on reference in Java, we can finally produce a clearer answer on how == and .equals() differ:

Operator Primitive type Reference type
== Compares value-by-value Compares the reference value (which could be the heap address)
.equals() Primitive types do not have method Depending on the specific implementations, for example, the String class's implementation compares the underlying content.
An example of a more complex equals() implementation can be seen in the HashMap class, which we will discuss in another note.

References #

  1. Java Language Specification
  2. Java Platform, Standard Edition & Java Development Kit Version 25 API Specification
  3. Java SE Spec, Chapter 4