Module deeporigin.src.utilities.alignments

Functions

def calculate_bounding_box(structure_coord, padding=0.0)
Expand source code
def calculate_bounding_box(structure_coord, padding=0.0):
    """
    Calculate the bounding box for a set of 3D coordinates.

    This function computes the minimum and maximum coordinates, dimensions, and center
    of a bounding box that encloses all points in the given coordinate set.

    Args:
        structure_coord (numpy.ndarray): Array of coordinates with shape (N, 3) where N is 
            the number of points.
        padding (float, optional): Additional padding to add around the bounding box. 
            Defaults to 0.0.

    Returns:
        tuple: A tuple containing:
            - min_coords (numpy.ndarray): Minimum coordinates of the bounding box (x,y,z)
            - max_coords (numpy.ndarray): Maximum coordinates of the bounding box (x,y,z)
            - dimensions (numpy.ndarray): Dimensions of the bounding box (width,height,depth)
            - center (numpy.ndarray): Center coordinates of the bounding box (x,y,z)
    """
    min_coords = np.min(structure_coord, axis=0) - padding
    max_coords = np.max(structure_coord, axis=0) + padding
    dimensions = max_coords - min_coords
    center = 0.5 * (max_coords + min_coords)

    return min_coords, max_coords, dimensions, center

Calculate the bounding box for a set of 3D coordinates.

This function computes the minimum and maximum coordinates, dimensions, and center of a bounding box that encloses all points in the given coordinate set.

Args

structure_coord : numpy.ndarray
Array of coordinates with shape (N, 3) where N is the number of points.
padding : float, optional
Additional padding to add around the bounding box. Defaults to 0.0.

Returns

tuple
A tuple containing: - min_coords (numpy.ndarray): Minimum coordinates of the bounding box (x,y,z) - max_coords (numpy.ndarray): Maximum coordinates of the bounding box (x,y,z) - dimensions (numpy.ndarray): Dimensions of the bounding box (width,height,depth) - center (numpy.ndarray): Center coordinates of the bounding box (x,y,z)
def calculate_fixed_bounding_box(structure_coord, box_size=24.0)
Expand source code
def calculate_fixed_bounding_box(structure_coord, box_size=24.0):
    """
    Calculate a fixed-size bounding box centered on a molecular structure.

    Args:
        structure_coord (numpy.ndarray): Array of shape (N, 3) containing 3D coordinates of the structure points
        box_size (float, optional): Length of each side of the cubic bounding box. Defaults to 24.0.

    Returns:
        tuple: A tuple containing:
            - min_coords (numpy.ndarray): Array of shape (3,) containing minimum x, y, z coordinates of the box
            - max_coords (numpy.ndarray): Array of shape (3,) containing maximum x, y, z coordinates of the box 
            - dimensions (numpy.ndarray): Array of shape (3,) containing the dimensions of the box (equal in all directions)
            - center (numpy.ndarray): Array of shape (3,) containing the coordinates of the box center (structure centroid)
    """
    # Calculate the centroid of the structure
    center = np.mean(structure_coord, axis=0)

    # Calculate half the box size
    half_size = box_size / 2.0

    # Define min and max coordinates based on the center and half_size
    min_coords = center - half_size
    max_coords = center + half_size

    # Dimensions of the box
    dimensions = max_coords - min_coords

    return min_coords, max_coords, dimensions, center

Calculate a fixed-size bounding box centered on a molecular structure.

Args

structure_coord : numpy.ndarray
Array of shape (N, 3) containing 3D coordinates of the structure points
box_size : float, optional
Length of each side of the cubic bounding box. Defaults to 24.0.

Returns

tuple
A tuple containing: - min_coords (numpy.ndarray): Array of shape (3,) containing minimum x, y, z coordinates of the box - max_coords (numpy.ndarray): Array of shape (3,) containing maximum x, y, z coordinates of the box - dimensions (numpy.ndarray): Array of shape (3,) containing the dimensions of the box (equal in all directions) - center (numpy.ndarray): Array of shape (3,) containing the coordinates of the box center (structure centroid)
def create_bounding_box(ligand, padding=0.0, output_file=None, around_ligand=False, box_size=20.0)
Expand source code
def create_bounding_box(ligand, padding=0.0, output_file=None, around_ligand=False, box_size=20.0):
    """
    Creates a bounding box around a molecular structure with specified parameters.

    Args:
        ligand (Structure): A molecular structure object containing 3D coordinates
        padding (float, optional): Additional space to add around the ligand's dimensions. Defaults to 0.0.
        output_file (str, optional): Path to save the bounding box as a structure file. Defaults to None.
        around_ligand (bool, optional): If True, creates a box that fits around ligand with padding.
            If False, creates a fixed size box. Defaults to False.
        box_size (float, optional): Size of the fixed bounding box when around_ligand is False. Defaults to 20.0.

    Returns:
        dict: A dictionary containing:
            - min_coords (numpy.ndarray): Minimum x,y,z coordinates
            - max_coords (numpy.ndarray): Maximum x,y,z coordinates
            - dimensions (numpy.ndarray): Box dimensions
            - center (numpy.ndarray): Box center coordinates
            - atom_array (AtomArray): Structure array of box vertices (only if output_file is specified)
    """
    structure_coord = ligand.coordinates
    if around_ligand:
        min_coords, max_coords, dimensions, center = calculate_bounding_box(structure_coord, padding)
    else:
        min_coords, max_coords, dimensions, center = calculate_fixed_bounding_box(structure_coord, box_size)

    result = {
        'min_coords': min_coords,
        'max_coords': max_coords,
        'dimensions': dimensions,
        'center': center
    }

    if output_file:
        atom_array = create_bounding_box_atoms(min_coords, max_coords, dimensions)
        struc.io.save_structure(output_file, atom_array)
        result['atom_array'] = atom_array
        print(f"Bounding box atoms saved to {output_file}")

    return result

Creates a bounding box around a molecular structure with specified parameters.

Args

ligand : Structure
A molecular structure object containing 3D coordinates
padding : float, optional
Additional space to add around the ligand's dimensions. Defaults to 0.0.
output_file : str, optional
Path to save the bounding box as a structure file. Defaults to None.
around_ligand : bool, optional
If True, creates a box that fits around ligand with padding. If False, creates a fixed size box. Defaults to False.
box_size : float, optional
Size of the fixed bounding box when around_ligand is False. Defaults to 20.0.

Returns

dict
A dictionary containing: - min_coords (numpy.ndarray): Minimum x,y,z coordinates - max_coords (numpy.ndarray): Maximum x,y,z coordinates - dimensions (numpy.ndarray): Box dimensions - center (numpy.ndarray): Box center coordinates - atom_array (AtomArray): Structure array of box vertices (only if output_file is specified)
def create_bounding_box_atoms(min_coords, max_coords, dimensions)
Expand source code
def create_bounding_box_atoms(min_coords, max_coords, dimensions):
    """
    Create a bounding box represented by atoms at its corners.

    This function creates a box by placing atoms (XE elements) at strategic corner positions. 
    The box is defined by its minimum and maximum coordinates along with its dimensions.

    Args:
        min_coords: List or array of 3 coordinates [x,y,z] representing the minimum corner of the box
        max_coords: List or array of 3 coordinates [x,y,z] representing the maximum corner of the box
        dimensions: List or array of 3 values [dx,dy,dz] representing the dimensions of the box

    Returns:
        struc.AtomArray: An array of 8 atoms representing the corners of the bounding box.
            Each atom has the following properties:
            - chain_id: "A"
            - res_id: 1
            - res_name: "BOX"
            - atom_name: "XE1" through "XE8"
            - element: "XE"

    Note:
        The atoms are placed at the eight corners of the box, with XE1 at min_coords
        and XE5 at max_coords. The remaining atoms are placed at intermediate corners
        formed by adding or subtracting the dimensions from these points.
    """
    atoms = []
    atom_index = 1

    # Create an atom at the minimum corner
    atom = struc.Atom(min_coords, chain_id="A", res_id=1, res_name="BOX", atom_name=f"XE{atom_index}", element="XE")
    atoms.append(atom)
    atom_index += 1

    # Create atoms at the corners formed by adding dimensions to min_coords
    for i in range(3):
        coord = min_coords.copy()
        coord[i] += dimensions[i]
        atom = struc.Atom(coord, chain_id="A", res_id=1, res_name="BOX", atom_name=f"XE{atom_index}", element="XE")
        atoms.append(atom)
        atom_index += 1

    # Create an atom at the maximum corner
    atom = struc.Atom(max_coords, chain_id="A", res_id=1, res_name="BOX", atom_name=f"XE{atom_index}", element="XE")
    atoms.append(atom)
    atom_index += 1

    # Create atoms at the corners formed by subtracting dimensions from max_coords
    for i in range(3):
        coord = max_coords.copy()
        coord[i] -= dimensions[i]
        atom = struc.Atom(coord, chain_id="A", res_id=1, res_name="BOX", atom_name=f"XE{atom_index}", element="XE")
        atoms.append(atom)
        atom_index += 1

    # Convert the list of atoms to an AtomArray
    atom_array = struc.array(atoms)
    return atom_array

Create a bounding box represented by atoms at its corners.

This function creates a box by placing atoms (XE elements) at strategic corner positions. The box is defined by its minimum and maximum coordinates along with its dimensions.

Args

min_coords
List or array of 3 coordinates [x,y,z] representing the minimum corner of the box
max_coords
List or array of 3 coordinates [x,y,z] representing the maximum corner of the box
dimensions
List or array of 3 values [dx,dy,dz] representing the dimensions of the box

Returns

struc.AtomArray
An array of 8 atoms representing the corners of the bounding box. Each atom has the following properties: - chain_id: "A" - res_id: 1 - res_name: "BOX" - atom_name: "XE1" through "XE8" - element: "XE"

Note

The atoms are placed at the eight corners of the box, with XE1 at min_coords and XE5 at max_coords. The remaining atoms are placed at intermediate corners formed by adding or subtracting the dimensions from these points.

def save_bounding_box(center, box_size, output_file)
Expand source code
def save_bounding_box(center, box_size, output_file):
    """
    Save a bounding box structure to a file.

    Args:
        center (array-like): The (x, y, z) coordinates of the box center.
        box_size (array-like): The dimensions (width, height, depth) of the box.
        output_file (str): The path where the structure file will be saved.

    Returns:
        None: The function saves the structure to a file but does not return any value.

    Notes:
        The function uses the `calculate_box_min_max` to determine box boundaries and
        `create_bounding_box_atoms` to generate the atomic structure representation.
        The output is saved using the biotite structure module.
    """

    min_coords, max_coords = calculate_box_min_max(center, box_size)
    atom_array = create_bounding_box_atoms(min_coords, max_coords, box_size)
    struc.io.save_structure(output_file, atom_array)

Save a bounding box structure to a file.

Args

center : array-like
The (x, y, z) coordinates of the box center.
box_size : array-like
The dimensions (width, height, depth) of the box.
output_file : str
The path where the structure file will be saved.

Returns

None
The function saves the structure to a file but does not return any value.

Notes

The function uses the calculate_box_min_max to determine box boundaries and create_bounding_box_atoms() to generate the atomic structure representation. The output is saved using the biotite structure module.

Classes

class StructureAligner
Expand source code
class StructureAligner:
    """A class for aligning and transforming structural coordinates using Principal Component Analysis (PCA).

    This class provides functionality to:
    1. Calculate PCA components from input coordinates
    2. Align structures using calculated PCA components
    3. Restore structures from PCA-transformed coordinates
    4. Track PCA fitting state

    The aligner ensures right-handed coordinate systems and provides error handling for common edge cases.

        pca (Optional[PCA]): Principal Component Analysis object from scikit-learn.
        _components_fixed (bool): Flag indicating whether PCA components are fixed.

        >>> aligner = StructureAligner()
        >>> aligner.calculate_pca(initial_coords)
        >>> aligned_coords = aligner.align_structure(target_coords)
        >>> restored_coords = aligner.restore_structure(aligned_coords)

        - The PCA model must be fitted using calculate_pca() before alignment operations
        - All coordinate arrays should be compatible with scikit-learn's PCA implementation
    """
    def __init__(self):
        """Initialize the class with PCA components.

        Attributes:
            pca (Optional[PCA]): Principal Component Analysis object from scikit-learn, initially set to None.
            _components_fixed (bool): Flag indicating whether PCA components are fixed, initially set to False.
        """
        self.pca: Optional[PCA] = None
        self._components_fixed: bool = False

    def calculate_pca(self, coords: np.ndarray) -> None:
        """
        Calculates the Principal Component Analysis (PCA) for given coordinates.

        This method performs PCA on the input coordinates to find the principal axes of variation.
        It ensures right-handed coordinate system by checking and potentially flipping the third
        component's direction.

        Args:
            coords (np.ndarray): Input coordinates array for PCA calculation.

        Raises:
            ValueError: If coords is None.
            Exception: If PCA calculation fails for any other reason.

        Notes:
            - Sets self.pca with the calculated PCA object
            - Stores the components ensuring a right-handed coordinate system
            - Sets self._components_fixed flag to True upon successful calculation

        Example:
            >>> instance.calculate_pca(coordinates_array)
        """
        try:
            if coords is None:
                raise ValueError("Coordinates are None.")

            self.pca = PCA(n_components=3)
            self.pca.fit(coords)

            components = self.pca.components_
            if np.dot(np.cross(components[0], components[1]), components[2]) < 0:
                self.pca.components_ *= np.array([1, 1, -1])

            self._components_fixed = True
            DEFAULT_LOGGER.log_info("PCA components calculated and stored.")

        except Exception as e:
            DEFAULT_LOGGER.log_error(f"PCA calculation failed: {str(e)}")
            raise

    @property
    def is_fitted(self) -> bool:
        """
        Check if the PCA transformer has been fitted with data.

        Returns:
            bool: True if both PCA transformation is initialized and components are fixed,
                  False otherwise.
        """
        return self.pca is not None and self._components_fixed

    def align_structure(self, coords: np.ndarray) -> np.ndarray:
        """
        Aligns structural coordinates using pre-calculated PCA components.

        Args:
            coords (np.ndarray): Input coordinates to be aligned. Should have the same number of features
            as the data used to fit the PCA model.

        Returns:
            np.ndarray: The transformed coordinates in the principal component space.

        Raises:
            ValueError: If PCA components haven't been calculated (is_fitted=False) or if input coordinates are None
            Exception: If alignment process fails for any other reason

        Notes:
            The PCA model must be fitted by calling calculate_pca() before using this method.
        """

        if not self.is_fitted:
            raise ValueError("PCA components haven't been calculated. Call calculate_pca first.")

        try:
            if coords is None:
                raise ValueError("Coordinates are None.")

            # Transform coordinates
            aligned_coords = self.pca.transform(coords)
            DEFAULT_LOGGER.log_info("Coordinates aligned using PCA.")

            return aligned_coords

        except Exception as e:
            DEFAULT_LOGGER.log_error(f"Alignment failed: {str(e)}")
            raise

    def restore_structure(self, coords: np.ndarray) -> np.ndarray:
        """
        Restores the original structural coordinates from PCA-transformed coordinates.

        This method performs the inverse PCA transformation to convert coordinates from the
        reduced dimensionality PCA space back to their original structural representation.

        Args:
            coords (np.ndarray): The coordinates in PCA space to be restored to original structure space.

        Returns:
            np.ndarray: The restored coordinates in the original structural space.

        Raises:
            ValueError: If PCA components haven't been calculated (is_fitted=False) or if coords is None.
            Exception: If restoration process fails for any other reason.

        Note:
            The method requires that calculate_pca() has been called previously to fit the PCA model.
        """
        if not self.is_fitted:
            raise ValueError("PCA components haven't been calculated. Call calculate_pca first.")

        try:
            if coords is None:
                raise ValueError("Coordinates are None.")

            # Inverse transform coordinates
            restored_coords = self.pca.inverse_transform(coords)
            DEFAULT_LOGGER.log_info("Coordinates restored from PCA space.")

            return restored_coords

        except Exception as e:
            DEFAULT_LOGGER.log_error(f"Restoration failed: {str(e)}")
            raise

A class for aligning and transforming structural coordinates using Principal Component Analysis (PCA).

This class provides functionality to: 1. Calculate PCA components from input coordinates 2. Align structures using calculated PCA components 3. Restore structures from PCA-transformed coordinates 4. Track PCA fitting state

The aligner ensures right-handed coordinate systems and provides error handling for common edge cases.

pca (Optional[PCA]): Principal Component Analysis object from scikit-learn.
_components_fixed (bool): Flag indicating whether PCA components are fixed.

>>> aligner = StructureAligner()
>>> aligner.calculate_pca(initial_coords)
>>> aligned_coords = aligner.align_structure(target_coords)
>>> restored_coords = aligner.restore_structure(aligned_coords)

- The PCA model must be fitted using calculate_pca() before alignment operations
- All coordinate arrays should be compatible with scikit-learn's PCA implementation

Initialize the class with PCA components.

Attributes

pca : Optional[PCA]
Principal Component Analysis object from scikit-learn, initially set to None.
_components_fixed : bool
Flag indicating whether PCA components are fixed, initially set to False.

Instance variables

prop is_fitted : bool
Expand source code
@property
def is_fitted(self) -> bool:
    """
    Check if the PCA transformer has been fitted with data.

    Returns:
        bool: True if both PCA transformation is initialized and components are fixed,
              False otherwise.
    """
    return self.pca is not None and self._components_fixed

Check if the PCA transformer has been fitted with data.

Returns

bool
True if both PCA transformation is initialized and components are fixed, False otherwise.

Methods

def align_structure(self, coords: numpy.ndarray) ‑> numpy.ndarray
Expand source code
def align_structure(self, coords: np.ndarray) -> np.ndarray:
    """
    Aligns structural coordinates using pre-calculated PCA components.

    Args:
        coords (np.ndarray): Input coordinates to be aligned. Should have the same number of features
        as the data used to fit the PCA model.

    Returns:
        np.ndarray: The transformed coordinates in the principal component space.

    Raises:
        ValueError: If PCA components haven't been calculated (is_fitted=False) or if input coordinates are None
        Exception: If alignment process fails for any other reason

    Notes:
        The PCA model must be fitted by calling calculate_pca() before using this method.
    """

    if not self.is_fitted:
        raise ValueError("PCA components haven't been calculated. Call calculate_pca first.")

    try:
        if coords is None:
            raise ValueError("Coordinates are None.")

        # Transform coordinates
        aligned_coords = self.pca.transform(coords)
        DEFAULT_LOGGER.log_info("Coordinates aligned using PCA.")

        return aligned_coords

    except Exception as e:
        DEFAULT_LOGGER.log_error(f"Alignment failed: {str(e)}")
        raise

Aligns structural coordinates using pre-calculated PCA components.

Args

coords : np.ndarray
Input coordinates to be aligned. Should have the same number of features

as the data used to fit the PCA model.

Returns

np.ndarray
The transformed coordinates in the principal component space.

Raises

ValueError
If PCA components haven't been calculated (is_fitted=False) or if input coordinates are None
Exception
If alignment process fails for any other reason

Notes

The PCA model must be fitted by calling calculate_pca() before using this method.

def calculate_pca(self, coords: numpy.ndarray) ‑> None
Expand source code
def calculate_pca(self, coords: np.ndarray) -> None:
    """
    Calculates the Principal Component Analysis (PCA) for given coordinates.

    This method performs PCA on the input coordinates to find the principal axes of variation.
    It ensures right-handed coordinate system by checking and potentially flipping the third
    component's direction.

    Args:
        coords (np.ndarray): Input coordinates array for PCA calculation.

    Raises:
        ValueError: If coords is None.
        Exception: If PCA calculation fails for any other reason.

    Notes:
        - Sets self.pca with the calculated PCA object
        - Stores the components ensuring a right-handed coordinate system
        - Sets self._components_fixed flag to True upon successful calculation

    Example:
        >>> instance.calculate_pca(coordinates_array)
    """
    try:
        if coords is None:
            raise ValueError("Coordinates are None.")

        self.pca = PCA(n_components=3)
        self.pca.fit(coords)

        components = self.pca.components_
        if np.dot(np.cross(components[0], components[1]), components[2]) < 0:
            self.pca.components_ *= np.array([1, 1, -1])

        self._components_fixed = True
        DEFAULT_LOGGER.log_info("PCA components calculated and stored.")

    except Exception as e:
        DEFAULT_LOGGER.log_error(f"PCA calculation failed: {str(e)}")
        raise

Calculates the Principal Component Analysis (PCA) for given coordinates.

This method performs PCA on the input coordinates to find the principal axes of variation. It ensures right-handed coordinate system by checking and potentially flipping the third component's direction.

Args

coords : np.ndarray
Input coordinates array for PCA calculation.

Raises

ValueError
If coords is None.
Exception
If PCA calculation fails for any other reason.

Notes

  • Sets self.pca with the calculated PCA object
  • Stores the components ensuring a right-handed coordinate system
  • Sets self._components_fixed flag to True upon successful calculation

Example

>>> instance.calculate_pca(coordinates_array)
def restore_structure(self, coords: numpy.ndarray) ‑> numpy.ndarray
Expand source code
def restore_structure(self, coords: np.ndarray) -> np.ndarray:
    """
    Restores the original structural coordinates from PCA-transformed coordinates.

    This method performs the inverse PCA transformation to convert coordinates from the
    reduced dimensionality PCA space back to their original structural representation.

    Args:
        coords (np.ndarray): The coordinates in PCA space to be restored to original structure space.

    Returns:
        np.ndarray: The restored coordinates in the original structural space.

    Raises:
        ValueError: If PCA components haven't been calculated (is_fitted=False) or if coords is None.
        Exception: If restoration process fails for any other reason.

    Note:
        The method requires that calculate_pca() has been called previously to fit the PCA model.
    """
    if not self.is_fitted:
        raise ValueError("PCA components haven't been calculated. Call calculate_pca first.")

    try:
        if coords is None:
            raise ValueError("Coordinates are None.")

        # Inverse transform coordinates
        restored_coords = self.pca.inverse_transform(coords)
        DEFAULT_LOGGER.log_info("Coordinates restored from PCA space.")

        return restored_coords

    except Exception as e:
        DEFAULT_LOGGER.log_error(f"Restoration failed: {str(e)}")
        raise

Restores the original structural coordinates from PCA-transformed coordinates.

This method performs the inverse PCA transformation to convert coordinates from the reduced dimensionality PCA space back to their original structural representation.

Args

coords : np.ndarray
The coordinates in PCA space to be restored to original structure space.

Returns

np.ndarray
The restored coordinates in the original structural space.

Raises

ValueError
If PCA components haven't been calculated (is_fitted=False) or if coords is None.
Exception
If restoration process fails for any other reason.

Note

The method requires that calculate_pca() has been called previously to fit the PCA model.