cosine_kernel#

hypercoil.functional.kernel.cosine_kernel(X0: Tensor, X1: Tensor | None = None, theta: Tensor | None = None) Tensor[source]#

Parameterised cosine kernel between input tensors.

For tensors \(X_0\) and \(X_1\) containing features in column vectors, the parameterised cosine kernel is

\(K_{\theta}(X_0, X_1) = \frac{X_0^\intercal \theta X_1}{\|X_0\|_\theta \|X_1\|_\theta}\)

where the parameterised norm vector

\(\|A\|_{\theta;i} = \sqrt{A_i^\intercal \theta A_i}\)

is the elementwise square root of the vector of quadratic forms.

Note

The inputs here are assumed to contain features in row vectors and observations in columns. This differs from the convention frequently used in the literature. However, this has the benefit of direct compatibility with the top-k sparse tensor format.

Dimension:
X0 : \((*, N, P)\) or \((N, P, *)\)

N denotes number of observations, P denotes number of features, * denotes any number of additional dimensions. If the input is dense, then the last dimensions should be N and P; if it is sparse, then the first dimensions should be N and P.

X1 : \((*, M, P)\) or \((M, P, *)\)

M denotes number of observations.

theta : \((*, P, P)\) or \((*, P)\)

As above.

Output : \((*, M, N)\) or \((M, N, *)\)

As above.

Parameters:
X0tensor

A feature tensor.

X1tensor or None

Second feature tensor. If not explicitly provided, the kernel of X with itself is computed.

thetatensor or None

Kernel parameter (generally a representation of a positive definite matrix). If not provided, defaults to identity (an unparameterised kernel). If the last two dimensions are the same size, they are used as a matrix parameter; if they are not, the final axis is instead used as the diagonal of the matrix.

gammafloat or None (default None)

Scaling coefficient. If not explicitly specified, this is automatically set to \(\frac{1}{P}\).

Returns:
tensor

Kernel Gram matrix.