API reference

LSHFunction API

LSHFunctions.LSHFunction — Type

LSHFunction(similarity, args...; kws...)

Construct the default LSHFunction subtype that corresponds to the similarity function similarity.

Arguments

similarity: the similarity function you want to use. Can be any of the following:
- cossim
- inner_prod
- jaccard
- ℓ1
- ℓ2
args...: arguments to pass on to the default LSHFunction constructor corresponding to similarity.
kws...: keyword parameters to pass on to the default LSHFunction constructor corresponding to similarity.

Returns

Returns a subtype of LSH.LSHFunction that hashes the similarity function similarity.

Examples

In the snippet below, we construct LSHFunctions.SimHash (the default hash function corresponding to cosine similarity) using LSHFunction():

julia> hashfn = LSHFunction(cossim);

julia> typeof(hashfn) <: LSHFunctions.SimHash <: LSHFunction
true

We can provide arguments and keyword parameters corresponding to the hash function that we construct:

julia> hashfn = LSHFunction(inner_prod, 100; dtype=Float64, maxnorm=10);

julia> n_hashes(hashfn) == 100 &&
       typeof(hashfn) <: SignALSH{Float64} &&
       hashfn.maxnorm == 10
true

Hash functions

LSHFunctions.SimHash — Type

SimHash(n_hashes::Integer = 1;
        dtype::Type = Float32,
        resize_pow2::Bool = false)

Creates a locality-sensitive hash function for cosine similarity.

Arguments

n_hashes::Integer (default: 1): the number of hash functions to generate.

Keyword parameters

dtype::Type (default: Float32): the data type to use in the LSHFunctions.SimHash internals. For performance reasons you should pick dtype to match the type of the data you're hashing.
resize_pow2::Bool (default: false): affects the way in which the returned LSHFunctions.SimHash resizes to hash inputs of different sizes. If you think you'll be hashing inputs of many different sizes, it's more efficient to set resize_pow2 = true.

Examples

Construct a hash function by calling SimHash with the number of hash functions you want to generate:

julia> hashfn = SimHash(24);

julia> n_hashes(hashfn) == 24 &&
       similarity(hashfn) == cossim
true

You can then call hashfn(x) in order to compute hashes:

julia> hashfn = SimHash(32);

julia> x = randn(30);

julia> hashes = hashfn(x);

References

Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC '02, page 380–388, New York, NY, USA, 2002. Association for Computing Machinery. 10.1145/509907.509965.

Similarity functions

LSHFunctions.L1 — Method

Lp(x::AbstractVector, y::AbstractVector, p::Real=2)
L1(x::AbstractVector, y::AbstractVector)
L2(x::AbstractVector, y::AbstractVector)

Computes the $ℓ^p$ distance between a pair of vectors $x$ and $y$. Identical to ℓp(x,y,p), ℓ1(x,y), and ℓ2(x,y), respectively.

See also: ℓp

Lp(f, g, interval::LSHFunctions.RealInterval, p)
L1(f, g, interval::LSHFunctions.RealInterval)
L2(f, g, interval::LSHFunctions.RealInterval)

Computes the $L^p$ distance between two functions, given by

$L^p(f,g) \coloneqq \|f - g\|_p = \left(\int_a^b \left|f(x) - g(x)\right|^p \hspace{0.15cm} dx\right)^{1/p}$

Examples

Below we compute the $L^1$, $L^2$, and $L^3$ distances between $f(x) = x^2 + 1$ and $g(x) = 2x$ over the interval $[0,1]$. The distances are computed by evaluating the integral

$\left(\int_0^1 \left|f(x) - g(x)\right|^p \hspace{0.15cm}dx\right)^{1/p} = \left(\int_0^1 \left|x^2 - 2x + 1\right|^p \hspace{0.15cm}dx\right)^{1/p} = \left(\int_0^1 (x - 1)^{2p} \hspace{0.15cm}dx\right)^{1/p}$

for $p = 1$, $p = 2$, and $p = 3$.

julia> f(x) = x^2 + 1; g(x) = 2x;

julia> interval = @interval(0 ≤ x ≤ 1);

julia> Lp(f, g, interval, 1) ≈ L1(f, g, interval) ≈ 3^(-1)
true

julia> Lp(f, g, interval, 2) ≈ L2(f, g, interval) ≈ 5^(-1/2)
true

julia> Lp(f, g, interval, 3) ≈ 7^(-1/3)
true

See also: ℓp

source

LSHFunctions.L1_norm — Method

Lp_norm(x::AbstractVector, p::Real = 2)
L1_norm(x::AbstractVector)
L2_norm(x::AbstractVector)

Compute the $\ell^p$ norm of a vector $x$. Identical to ℓp_norm(x,p), ℓ1_norm(x), and ℓ2_norm(x), respectively.

See also: ℓp_norm

source

LSHFunctions.L1_norm — Method

Lp_norm(x::AbstractVector, p::Real = 2)
L1_norm(x::AbstractVector)
L2_norm(x::AbstractVector)

Compute the $\ell^p$ norm of a vector $x$. Identical to ℓp_norm(x,p), ℓ1_norm(x), and ℓ2_norm(x), respectively.

See also: ℓp_norm

Lp_norm(f, interval::LSHFunctions.RealInterval, p::Real=2)
L1_norm(f, interval::LSHFunctions.RealInterval)
L2_norm(f, interval::LSHFunctions.RealInterval)

Computes the $L^p$ function-space norm of a function $f$, which is given by the equation

$\|f\|_p = \left(\int_a^b \left|f(x)\right|^p \hspace{0.15cm} dx\right)^{1/p}$

L1_norm(f, interval) is the same as Lp_norm(f, interval, 1), and L2_norm(f, interval) is the same as Lp_norm(f, interval, 2).

Examples

julia> f(x) = x;

julia> interval = @interval(0 ≤ x ≤ 1);

julia> Lp_norm(f, interval, 1) ≈ L1_norm(f, interval) ≈ 2^(-1/1)
true

julia> Lp_norm(f, interval, 2) ≈ L2_norm(f, interval) ≈ 3^(-1/2)
true

julia> Lp_norm(f, interval, 3) ≈ 4^(-1/3)
true

source

LSHFunctions.L2 — Method

Lp(x::AbstractVector, y::AbstractVector, p::Real=2)
L1(x::AbstractVector, y::AbstractVector)
L2(x::AbstractVector, y::AbstractVector)

Computes the $ℓ^p$ distance between a pair of vectors $x$ and $y$. Identical to ℓp(x,y,p), ℓ1(x,y), and ℓ2(x,y), respectively.

See also: ℓp

Lp(f, g, interval::LSHFunctions.RealInterval, p)
L1(f, g, interval::LSHFunctions.RealInterval)
L2(f, g, interval::LSHFunctions.RealInterval)

Computes the $L^p$ distance between two functions, given by

$L^p(f,g) \coloneqq \|f - g\|_p = \left(\int_a^b \left|f(x) - g(x)\right|^p \hspace{0.15cm} dx\right)^{1/p}$

Examples

Below we compute the $L^1$, $L^2$, and $L^3$ distances between $f(x) = x^2 + 1$ and $g(x) = 2x$ over the interval $[0,1]$. The distances are computed by evaluating the integral

for $p = 1$, $p = 2$, and $p = 3$.

julia> f(x) = x^2 + 1; g(x) = 2x;

julia> interval = @interval(0 ≤ x ≤ 1);

julia> Lp(f, g, interval, 1) ≈ L1(f, g, interval) ≈ 3^(-1)
true

julia> Lp(f, g, interval, 2) ≈ L2(f, g, interval) ≈ 5^(-1/2)
true

julia> Lp(f, g, interval, 3) ≈ 7^(-1/3)
true

See also: ℓp

source

LSHFunctions.L2_norm — Method

Lp_norm(x::AbstractVector, p::Real = 2)
L1_norm(x::AbstractVector)
L2_norm(x::AbstractVector)

Compute the $\ell^p$ norm of a vector $x$. Identical to ℓp_norm(x,p), ℓ1_norm(x), and ℓ2_norm(x), respectively.

See also: ℓp_norm

source

LSHFunctions.L2_norm — Method

Lp_norm(x::AbstractVector, p::Real = 2)
L1_norm(x::AbstractVector)
L2_norm(x::AbstractVector)

Compute the $\ell^p$ norm of a vector $x$. Identical to ℓp_norm(x,p), ℓ1_norm(x), and ℓ2_norm(x), respectively.

See also: ℓp_norm

Lp_norm(f, interval::LSHFunctions.RealInterval, p::Real=2)
L1_norm(f, interval::LSHFunctions.RealInterval)
L2_norm(f, interval::LSHFunctions.RealInterval)

Computes the $L^p$ function-space norm of a function $f$, which is given by the equation

$\|f\|_p = \left(\int_a^b \left|f(x)\right|^p \hspace{0.15cm} dx\right)^{1/p}$

L1_norm(f, interval) is the same as Lp_norm(f, interval, 1), and L2_norm(f, interval) is the same as Lp_norm(f, interval, 2).

Examples

julia> f(x) = x;

julia> interval = @interval(0 ≤ x ≤ 1);

julia> Lp_norm(f, interval, 1) ≈ L1_norm(f, interval) ≈ 2^(-1/1)
true

julia> Lp_norm(f, interval, 2) ≈ L2_norm(f, interval) ≈ 3^(-1/2)
true

julia> Lp_norm(f, interval, 3) ≈ 4^(-1/3)
true

source

LSHFunctions.Lp — Function

Lp(f, g, interval::LSHFunctions.RealInterval, p)
L1(f, g, interval::LSHFunctions.RealInterval)
L2(f, g, interval::LSHFunctions.RealInterval)

Computes the $L^p$ distance between two functions, given by

$L^p(f,g) \coloneqq \|f - g\|_p = \left(\int_a^b \left|f(x) - g(x)\right|^p \hspace{0.15cm} dx\right)^{1/p}$

Examples

Below we compute the $L^1$, $L^2$, and $L^3$ distances between $f(x) = x^2 + 1$ and $g(x) = 2x$ over the interval $[0,1]$. The distances are computed by evaluating the integral

for $p = 1$, $p = 2$, and $p = 3$.

julia> f(x) = x^2 + 1; g(x) = 2x;

julia> interval = @interval(0 ≤ x ≤ 1);

julia> Lp(f, g, interval, 1) ≈ L1(f, g, interval) ≈ 3^(-1)
true

julia> Lp(f, g, interval, 2) ≈ L2(f, g, interval) ≈ 5^(-1/2)
true

julia> Lp(f, g, interval, 3) ≈ 7^(-1/3)
true

See also: ℓp

source

LSHFunctions.Lp_norm — Function

Lp_norm(f, interval::LSHFunctions.RealInterval, p::Real=2)
L1_norm(f, interval::LSHFunctions.RealInterval)
L2_norm(f, interval::LSHFunctions.RealInterval)

Computes the $L^p$ function-space norm of a function $f$, which is given by the equation

$\|f\|_p = \left(\int_a^b \left|f(x)\right|^p \hspace{0.15cm} dx\right)^{1/p}$

L1_norm(f, interval) is the same as Lp_norm(f, interval, 1), and L2_norm(f, interval) is the same as Lp_norm(f, interval, 2).

Examples

julia> f(x) = x;

julia> interval = @interval(0 ≤ x ≤ 1);

julia> Lp_norm(f, interval, 1) ≈ L1_norm(f, interval) ≈ 2^(-1/1)
true

julia> Lp_norm(f, interval, 2) ≈ L2_norm(f, interval) ≈ 3^(-1/2)
true

julia> Lp_norm(f, interval, 3) ≈ 4^(-1/3)
true

source

LSHFunctions.Lp_norm — Function

Lp_norm(x::AbstractVector, p::Real = 2)
L1_norm(x::AbstractVector)
L2_norm(x::AbstractVector)

Compute the $\ell^p$ norm of a vector $x$. Identical to ℓp_norm(x,p), ℓ1_norm(x), and ℓ2_norm(x), respectively.

See also: ℓp_norm

source

LSHFunctions.cossim — Method

cossim(x,y)

Computes the cosine similarity between two inputs $x$ and $y$. Cosine similarity is defined as

$\text{cossim}(x,y) = \frac{\left\langle x,y\right\rangle}{\|x\|\cdot\|y\|}$

where $\left\langle\cdot,\cdot\right\rangle$ is an inner product (e.g. dot product) and $\|\cdot\|$ is its derived norm. This is roughly interpreted as being related to the angle between the inputs $x$ and $y$: when $x$ and $y$ have low angle between them, cossim(x,y) is high (close to $1$). When $x$ and $y$ have large angle between them, cossim(x,y) is low (close to $-1$).

Arguments

x and y: two inputs for which dot(x,y), norm(x), and norm(y) are defined.

Examples

julia> using LinearAlgebra: dot, norm;

julia> x, y = rand(4), rand(4);

julia> cossim(x,y) == dot(x,y) / (norm(x) * norm(y))
true

julia> z = rand(5);

julia> cossim(x,z)
ERROR: DimensionMismatch("dot product arguments have lengths 4 and 5")

Hashing in function spaces

LSHFunctions.MonteCarloHash — Type

MonteCarloHash(sim, ω, args...; volume=1.0, n_samples=1024, kws...)

Samples a hash function from an LSH family for the similarity sim defined over the function space $L^p_{\mu}(\Omega)$. sim may be one of the following:

L1
L2
cossim

Given an input function $f\in L^p_{\mu}(\Omega)$, MonteCarloHash works by sampling $f$ at some randomly-selected points in $\Omega$, and then hashing those samples.

Arguments

sim: the similarity statistic you want to hash on.
ω: a function that takes no inputs and samples a single point from $\Omega$. Alternatively, it can be viewed as a random variable with probability measure

\[\frac{\mu}{\text{vol}_{\mu}(\Omega)} = \frac{\mu}{\int_{\Omega} d\mu}\]

args...: arguments to pass on when building the LSHFunction instance underlying the returned MonteCarloHash struct.
volume::Real (default: 1.0): the volume of the space $\Omega$, defined as

\[\text{vol}_{\mu}(\Omega) = \int_{\Omega} d\mu\]

n_samples::Integer (default: 1024): the number of points to sample from each function that is hashed by the MonteCarloHash. Larger values of n_samples tend to capture the input function better and will thus be more likely to achieve desirable collision probabilities.
kws...: keyword arguments to pass on when building the LSHFunction instance underlying the returned MonteCarloHash struct.

Examples

Create a hash function for cosine similarity for functions in $L^2([-1,1])$:

julia> μ() = 2*rand()-1;   # μ samples a random point from [-1,1]

julia> hashfn = MonteCarloHash(cossim, μ, 50; volume=2.0);

julia> n_hashes(hashfn)
50

julia> similarity(hashfn) == cossim
true

julia> hashtype(hashfn)
Bool

Create a hash function for $L^2$ distance in the function space $L^2([0,2\pi])$. Hash the functions f(x) = cos(x) and f(x) = x/(2π) using the returned MonteCarloHash.

julia> μ() = 2π * rand(); # μ samples a random point from [0,2π]

julia> hashfn = MonteCarloHash(L2, μ, 3; volume=2π);

julia> hashfn(cos)
3-element Array{Int32,1}:
 -1
  3
  0

julia> hashfn(x -> x/(2π))
3-element Array{Int32,1}:
 -1
 -2
 -1

Create a hash function with a different number of sample points.

julia> μ() = rand();  # Samples a random point from [0,1]

julia> hashfn = MonteCarloHash(cossim, μ; volume=1.0, n_samples=512);

julia> length(hashfn.sample_points)
512

source

Miscellaneous

LSHFunctions.@interval — Macro

@interval(expr)

Construct a new LSHFunctions.RealInterval representing an interval on the real line from an expression such as

0 ≤ x < 1

The returned expression constructs an LSHFunctions.RealInterval encoding the lower and upper bounds on the interval, as well as whether the ends are opened or closed.

Examples

You can construct an interval using the following syntax:

julia> interval = @interval(0 ≤ x < 1);

There are usually multiple ways of constructing the same interval. For instance, each of the expressions below are equivalent ways of constructing the interval [-1,1].

julia> @interval(-1 ≤  x ≤  1) ==
       @interval(-1 <= x <= 1) ==
       @interval(-1 ≤  y ≤  1) ==
       @interval( 1 ≥  x ≥ -1)
true

You can even create intervals with Inf or -Inf at the endpoints, e.g. @interval(-Inf < x < Inf).

There are two primary operations you can run on an interval: testing for membership and intersection. You can test whether or not x is in an interval using x ∈ interval, as shown below.

julia> interval = @interval(0 ≤ x < 1);

julia> 0 ∈ interval && 1 ∉ interval
true

julia> 0 in interval    # This syntax also works
true

You can also intersect two intervals using the ∩ operator (or by using intersect(interval_1, interval_2)).

julia> @interval(0 ≤ x < 1) ∩ @interval(1/2 < x ≤ 1) == @interval(1/2 < x < 1)
true

Private interface

LSHFunctions.RealInterval — Type

struct RealInterval{T<:Real}

Encodes an interval of the real line, such as [-1,1] or [0,Inf).

Fields

lower::T: lower bound on the interval.
upper::T: upper bound on the interval.
closed_below::Bool: whether or not the interval is closed below.
closed_above::Bool: whether or not the interval is closed above.

Examples

The following snippet constructs RealInterval represeting the interval [0,1)

julia> interval = LSHFunctions.RealInterval(0, 1, true, false);

It's generally easier to construct an interval using the @interval macro. Check out the documentation for @interval for more information.