RFFSpatialRelationLocationEncoder Documentation

Overview

The RFFSpatialRelationLocationEncoder is designed to process spatial relations between locations using Random Fourier Features (RFF) adapted for spatial encoding. It utilizes the RFFSpatialRelationPositionEncoder to transform spatial coordinates into a high-dimensional space, enhancing the model’s ability to capture and interpret spatial relationships across various scales.

Features

  • Position Encoding (self.position_encoder): Utilizes RFFSpatialRelationPositionEncoder for transforming spatial differences into frequency-based representations using Random Fourier Features.

  • Feed-Forward Neural Network (self.ffn): Processes the RFF-based data through a multi-layer neural network to generate final spatial embeddings.

Configuration Parameters

  • spa_embed_dim: Dimensionality of the output spatial embeddings.

  • coord_dim: Dimensionality of the coordinate space (typically 2D).

  • frequency_num: Number of frequency components used in the positional encoding.

  • rbf_kernel_size: Size of the RBF kernel used in the generation of direction vectors.

  • extent: The extent of the coordinate space (optional).

  • device: Computation device used (e.g., ‘cuda’ for GPU acceleration).

  • ffn_act: Activation function for the neural network layers.

  • ffn_num_hidden_layers: Number of hidden layers in the neural network.

  • ffn_dropout_rate: Dropout rate to prevent overfitting during training.

  • ffn_hidden_dim: Dimension of each hidden layer within the network.

  • ffn_use_layernormalize: Whether to use layer normalization.

  • ffn_skip_connection: Whether to include skip connections within the network layers.

  • ffn_context_str: Context string for debugging and detailed logging within the network.

Methods

forward(coords)

  • Purpose: Processes input coordinates through the encoder to produce spatial embeddings.

  • Parameters:

    • coords (List or np.ndarray): Coordinates to process, formatted as (batch_size, num_context_pt, coord_dim).

  • Returns:

    • sprenc (Tensor): The final spatial relation embeddings, shaped (batch_size, num_context_pt, spa_embed_dim).

RFFSpatialRelationPositionEncoder

Overview

The RFFSpatialRelationPositionEncoder leverages Random Fourier Features (RFF) to encode spatial coordinates into high-dimensional representations. This method is based on the paper “Random Features for Large-Scale Kernel Machines” and is particularly effective for approximating kernel functions.

Features

  • Random Fourier Feature Encoding: Transforms spatial data into a frequency-based representation, capturing inherent spatial frequencies and patterns effectively.

  • Adaptable to Different Spatial Extents: Can normalize input coordinates based on the provided spatial extent.

Theory

Random Fourier Feature (RFF) Encoding

Random Fourier Features provide an approximation to shift-invariant kernel functions by mapping the input data into a randomized low-dimensional feature space. The key idea is to use random projections to approximate the kernel function.

Gaussian RBF Kernel Approximation

The Gaussian RBF kernel is defined as: $K(x, y) = \exp\left(-\frac{|x - y|^2}{2\sigma^2}\right)$ where $|x - y|$ is the Euclidean distance between points $x$ and $y$, and$\sigma$ is the kernel size (bandwidth).

Random Fourier Features

Using Bochner’s theorem, any shift-invariant kernel can be represented as the Fourier transform of a probability measure. For the Gaussian RBF kernel, the transformation is given by: $z(x) = \sqrt{\frac{2}{D}} \cos(\omega^T x + b)$ where$ \omega$ is drawn from a Gaussian distribution, $b$ is drawn from a uniform distribution, and $D$ is the dimension of the feature space.

Formulas

  1. Generate Direction and Shift Vectors:

    • Direction vector $\omega$: $\omega \sim \mathcal{N}(0, \sigma^2 I)$

    • Shift vector $b$:

    $b \sim \text{Uniform}(0, 2\pi)$

  2. Random Fourier Feature Transformation:

    $z(x) = \sqrt{\frac{2}{D}} \cos(\omega^T x + b)$

Implementation

generate_direction_vector()

  • Purpose: Generates the direction (omega) and shift (b) vectors used in the RFF transformation.

  • Returns:

    • dirvec: Direction vectors.

    • shift: Shift vectors.

make_output_embeds(coords)

  • Purpose: Converts input coordinates into RFF-based high-dimensional embeddings.

  • Parameters:

    • coords: Input coordinates.

  • Returns:

    • High-dimensional embeddings representing the input data in the RFF feature space.

Usage Example

# Initialize the encoder
encoder = RFFSpatialRelationLocationEncoder(
    spa_embed_dim=64,
    coord_dim=2,
    frequency_num=16,
    rbf_kernel_size=1.0,
    extent=None,
    device="cuda",
    ffn_act="relu",
    ffn_num_hidden_layers=1,
    ffn_dropout_rate=0.5,
    ffn_hidden_dim=256,
    ffn_use_layernormalize=True,
    ffn_skip_connection=True,
    ffn_context_str="RFFSpatialRelationEncoder"
)

coords = np.array([[34.0522, -118.2437], [40.7128, -74.0060]])  # Example coordinate data
embeddings = encoder.forward(coords)