SphereMixScaleSpatialRelationLocationEncoder Documentation

Overview

The SphereMixScaleSpatialRelationLocationEncoder is engineered to encode spatial relationships between locations using advanced position encoding techniques. It integrates the SphereMixScaleSpatialRelationPositionEncoder for initial encoding and processes the results through a multi-layer feed-forward neural network to produce high-dimensional spatial embeddings.

Features

  • Position Encoding (self.position_encoder): Utilizes the SphereMixScaleSpatialRelationPositionEncoder to encode spatial differences (deltaX, deltaY) using geometrically scaled sinusoidal functions.

  • Feed-Forward Neural Network (self.ffn): Transforms position-encoded data through several neural network layers to produce high-dimensional spatial embeddings.

Configuration Parameters

  • spa_embed_dim: The dimensionality of the output spatial embeddings.

  • coord_dim: The dimensionality of the coordinate space, typically 2D.

  • device: Specifies the computation device, e.g., ‘cuda’.

  • frequency_num: Number of frequency components used in positional encoding.

  • max_radius: The largest spatial context radius the model can handle.

  • min_radius: The minimum radius, ensuring detailed capture at smaller scales.

  • freq_init: Initialization method for frequency calculation, set to ‘geometric’.

  • ffn_act: Activation function used in the MLP layers.

  • ffn_num_hidden_layers: Number of layers in the feed-forward network.

  • ffn_dropout_rate: Dropout rate for regularization within the MLP.

  • ffn_hidden_dim: Dimension of each hidden layer within the MLP.

  • ffn_use_layernormalize: Boolean to enable normalization within the MLP.

  • ffn_skip_connection: Enables skip connections within the MLP, potentially enhancing learning.

  • ffn_context_str: Context string for debugging and detailed logging within the network.

Methods

forward(coords)

Processes input coordinates through the location encoder to generate final spatial embeddings.

  • Parameters:

    • coords (List or np.ndarray): Coordinates to process, formatted as (batch_size, num_context_pt, coord_dim).

  • Returns:

    • sprenc (Tensor): Spatial relation embeddings with a shape of (batch_size, num_context_pt, spa_embed_dim).

SphereMixScaleSpatialRelationPositionEncoder

Overview

Transforms spatial coordinates into high-dimensional encoded formats using sinusoidal functions scaled across multiple frequencies, enhancing the model’s capability to discern spatial nuances.

Assumptions for Grid-Structured Data

Spatial Regularity

Grid data often comes in regular, evenly spaced intervals, such as pixels in images or cells in raster GIS data.

Two-Dimensional Structure

Most grid data is two-dimensional, requiring simultaneous encoding of both dimensions to capture spatial relationships effectively.

Formula Development

Base Sinusoidal Encoding

For each coordinate component $x$ and $y$, apply sinusoidal functions across multiple scales:

$E(x, y) = \bigoplus{}^{L-1}_{i=0} \left[ \sin(\omega_i x), \cos(\omega_i x), \sin(\omega_i y), \cos(\omega_i y) \right]$

Where:

  • $\bigoplus$ denotes vector concatenation.

  • $L$ is the number of different frequencies used.

  • $\omega_i$ are the scaled frequencies.

Frequency Scaling

Given the grid structure, frequency scaling might be adapted based on typical distances or resolutions encountered in grid data:

$\omega_i = \pi \cdot \left(\frac{2^i}{\text{cell size}}\right)$

This scaling method aligns the frequency increments with the spatial resolution of grid cells, allowing the encoder to capture variations within and between cells.

Enhanced Spatial Encoding

To account for the two-dimensional nature of grid data and potentially the interactions between grid cells, the encoding can be expanded to include mixed terms that combine $x$ and $y$ coordinates:

$E_{\text{enhanced}}(x, y) = E(x, y) \oplus \left[\sin(\omega_i x) \cdot \cos(\omega_i y), \cos(\omega_i x) \cdot \sin(\omega_i y)\right]$

These mixed terms help to model cross-dimensional spatial interactions, which are critical in grid-like structures where horizontal and vertical relationships might influence the spatial analysis.

Output Dimensionality

The output dimensionality, considering the enhanced encoding, becomes:

$\text{Output Dim} = 4L + 2L = 6L$

Where $4L$ comes from the original sinusoidal terms for $x$ and $y$, and $2L$ from the mixed terms added for cross-dimensional interactions.

Features

  • Geometric Frequency Scaling: Employs a geometric progression of frequencies for sinusoidal encoding, capturing a broad range of spatial details.

  • Configurable Parameters: Supports adjustments in encoding dimensions, frequency range, and computational resources.

Configuration Parameters

  • coord_dim: The dimensionality of the space being encoded.

  • frequency_num: The number of frequencies used for encoding.

  • device: Specifies the computational device.

Methods

cal_elementwise_angle(coord, cur_freq)

Calculates the angle for sinusoidal encoding based on the coordinate and the current frequency.

  • Parameters:

    • coord: The deltaX or deltaY.

    • cur_freq: The frequency index.

  • Returns:

    • The calculated angle for the sinusoidal transformation.

cal_coord_embed(coords_tuple)

Converts a batch of coordinates into sinusoidally-encoded vectors.

  • Parameters:

    • coords_tuple: Tuple of spatial differences.

  • Returns:

    • High-dimensional vector representing the encoded spatial relationships.

Usage Example

encoder = SphereMixScaleSpatialRelationLocationEncoder(
    spa_embed_dim=64,
    coord_dim=2,
    device="cuda",
    frequency_num=16,
    max_radius=10000,
    min_radius=10,
    freq_init="geometric",
    ffn_act="relu",
    ffn_num_hidden_layers=1,
    ffn_dropout_rate=0.5,
    ffn_hidden_dim=256,
    ffn_use_layernormalize=True,
    ffn_skip_connection=True,
    ffn_context_str="SphereMixScaleSpatialRelationEncoder"
)

coords = np.array([[34.0522, -118.2437], [40.7128, -74.0060]])  # Example coordinate data
embeddings = encoder.forward(coords)