Why

In [1]:
0.1 > 1//10
Out[1]:
true

Simon Byrne - Julia Computing

Each finite floating point number corresponds to a single real number

  • Just not maybe the one you think it is
In [2]:
@printf "%.60f" 0.1
0.100000000000000005551115123125782702118158340454101562500000

Why not just use promote?

Most languages do!

In [3]:
(x::Real,y::Real) = (promote(x,y)...)
(x::T, y::T) where {T<:Real} = x == y
Out[3]:
≃ (generic function with 2 methods)
In [4]:
0.1  1//10  0.1f0
Out[4]:
true
In [5]:
1//10  0.1  0.1f0
Out[5]:
false

is not a transitive relation:

$$ a \simeq b \quad \text{and} \quad b \simeq c \quad \nRightarrow \quad a \simeq c $$
In [6]:
0.1  1//10
Out[6]:
true
In [7]:
1//10  0.1f0
Out[7]:
true
In [8]:
0.1  0.1f0
Out[8]:
false

We can do better.

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

In [28]:
sum(1:10)
Out[28]:
55

Intrinsics

These are defined by LLVM, and typically reduce to native CPU instruction.

<(x::Float64, y::Float64) = lt_float(x, y)
In [29]:
@code_llvm 1.0 < 1.0
define i8 @"jlsys_<_60112"(double, double) #0 !dbg !5 {
top:
  %2 = fcmp olt double %0, %1
  %3 = zext i1 %2 to i8
  ret i8 %3
}
In [30]:
@code_native 1.0 < 1.0
	.section	__TEXT,__text,regular,pure_instructions
Filename: float.jl
	pushl	%ebp
	decl	%eax
	movl	%esp, %ebp
Source line: 432
	ucomisd	%xmm0, %xmm1
	seta	%al
	popl	%ebp
	retl
Source line: 432
	nop
	nop
	nop

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Conversions

Certain conversions can be done exactly (without changing the underlying numeric value):

  • UInt32 $\rightarrow$ UInt64
  • Int32 $\rightarrow$ Int64
  • UInt32 $\rightarrow$ Float64
  • Int32 $\rightarrow$ Float64
  • Float32 $\rightarrow$ Float64

We can use this to fill in some gaps.

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Library functions

BigInt and BigFloat are wrappers around external libraries (GMP and MPFR), which provide various comparison operators.

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Unsigned vs Signed

<( x::BitSigned,   y::BitUnsigned) = (x <  0) | (unsigned(x) <  y)
<( x::BitUnsigned, y::BitSigned  ) = (y >= 0) & (x <  unsigned(y))
  • check if the signed value is less than zero
  • unsigned simply reinterprets a signed as an unsigned integer

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Int64/UInt64 vs Float64

function <(x::Float64, y::Union{UInt64, Int64})
    fy = Float64(y)
    (x < fy) | ((x == fy) & (fy < Float64(typemax(y))) & (unsafe_trunc(typeof(y),fy) < y))
end
  1. Convert the integer to a Float64
  2. Compare that to the float
    • If !=, then we're done!
    • If ==, there may have been rounding in step 1.
  3. Check if the float is 2.0^64 / 2.0^63
    • these are the only potential values which can't be safely converted back to an integer
  4. Convert back to an integer, and do an integer comparison.
  • Though complicated, actually not that bad in terms of performance
    • Can be fully pipelined
    • Compiler is pretty goood at eliding unnecessary computations when one value is a constant.
  • Original to Julia as far as I'm aware (61fe3b9c3 was the initial commit).

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Rational vs Integer/Rational

Cross-multiply the denominator:

<(x::Rational, y::Integer ) = x.num < widemul(x.den,y)
<(x::Rational, y::Rational) = widemul(x.num,y.den) < widemul(x.den,y.num)
  • denominator is constrained by the constructor to be positive
    • don't need to check signs
  • widemul returns an integer of a sufficiently wide type so that there is no overflow

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Rational vs floating point

function <(x::Rational, y::AbstractFloat)
    (isnan(x) || isnan(y)) && return false

    xn, xp, xd = decompose(x)
    yn, yp, yd = decompose(y)

    if xd < 0
        xn = -xn
        xd = -xd
    end
    if yd < 0
        yn = -yn
        yd = -yd
    end

    xc, yc = widemul(xn,yd), widemul(yn,xd)
    xs, ys = sign(xc), sign(yc)

    if xs != ys
        return xs < ys
    elseif xs == 0
        # both are zero or ±Inf
        return xn < yn
    end

    xb, yb = ndigits0z(xc,2) + xp, ndigits0z(yc,2) + yp

    if xb == yb
        xc, yc = promote(xc,yc)
        if xp > yp
            xc = (xc<<(xp-yp))
        else
            yc = (yc<<(yp-xp))
        end
        return xc < yc
    else
        return xc > 0 ? xb < yb : yb < xb
    end
end
  1. Check for NaNs
  2. Cross multiply rational with significand of float
  3. Compare based on signs
    • if different, we're done
  4. Compare based on magnitude of leading digit
    • if different, we're done
  5. Compare based on values of cross multiplication

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Irrationals

Julia provides various constants as a parametric type

In [12]:
pi
Out[12]:
π = 3.1415926535897...
In [13]:
Float64(pi) < pi
Out[13]:
true
In [14]:
typeof(pi)
Out[14]:
Irrational{:π}

Irrational vs Float64

<(x::Irrational, y::Float64) = Float64(x,RoundUp) <= y
<(x::Float64, y::Irrational) = x <= Float64(y,RoundDown)
  • Exploits the fact that Irrational must occur between two Float64s (since a Float64 is always rational).
  • Float64(x, RoundUp) gives the smallest Float64 which is greater than or equal to x.

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Irrational vs BigFloat

<(x::Irrational, y::BigFloat) = setprecision(precision(y)+32) do
    big(x) < y
end
  • Assumes that an extra 32 bits is sufficient to distinguish an irrational.
  • We really should use a loop and keep adding bits until not equal.

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Irrational vs Rational

function <(x::AbstractIrrational, y::Rational{T}) where T
    T <: Unsigned && x < 0.0 && return true
    rx = rationalize(T, x)
    if lessrational(rx, x)
        return rx < y
    else
        return rx <= y
    end
end
  • rational(T, x::Irrational) gives the closest rational number to x that fits whose coefficients fit within type T

    • Computed using the convergents and semi-convergents (calculated from the truncated continued fraction of x).
  • lessrational(rx,x) is an @pure version of rx < big(x)

    • Will be evaluated to a constant at compile-time
  • Can't be used for Rational{BigInt}

Irrational vs Rational{BigInt}

<(x::AbstractIrrational, y::Rational{BigInt}) = big(x) < y

Not strictly correct:

In [15]:
r = 45471447111470790535029367847216232831674172166049053744846518889742361808273//
    14474011154664524427946373126085988481658748083205070504932198000989141204992
Out[15]:
45471447111470790535029367847216232831674172166049053744846518889742361808273//14474011154664524427946373126085988481658748083205070504932198000989141204992
In [16]:
r < pi
Out[16]:
false
In [17]:
setprecision(BigFloat, 1024) do
    r < pi
end
Out[17]:
true

Possible solution

Use the fact that the convergents alternate between being smaller and larger: work your way down the continued fraction until you can say for sure.

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Irrational vs Irrational

In [18]:
# julia 0.6
pi < pi
< not defined for Irrational{:π}

Stacktrace:
 [1] <(::Irrational{:π}, ::Irrational{:π}) at ./promotion.jl:350

Per Rutquist fixed this a month ago for 0.7/1.0 (#27797):

<(::Irrational{s}, ::Irrational{s}) where {s} = false
function <(x::AbstractIrrational, y::AbstractIrrational)
    Float64(x) != Float64(y) || throw(MethodError(<, (x, y)))
    return Float64(x) < Float64(y)
end

UInt32 UInt64 Int32 Int64 BigInt Float32 Float64 BigFloat Rational Irrational
UInt32
UInt64
Int32
Int64
BigInt
Float32
Float64
BigFloat
Rational
Irrational

Extending

Multiple dispatch allows easily extending to user-defined types.

Still work to do:

In [27]:
using DecFP
d64"1e100" == 1e100
Out[27]:
true
In [24]:
BigInt(1e100)
Out[24]:
10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104

Epilogue

  • While preparing this talk on Tuesday, I found an issue with the categorisation of the unicode operator $\eqsim$.
  • The fix is the only breaking change between 0.7 and 1.0 (#28511).