Module deeporigin.src.structures.internal_structures

Functions

def mol_from_block(block_type, block, sanitize=True, remove_hs=False)
Expand source code
def mol_from_block(block_type, block, sanitize=True, remove_hs=False):
    """
    Converts a molecular block string into a molecular object.

    Args:
        block_type (str): Type of molecular block format (e.g., 'mol', 'sdf', 'pdb')
        block (str): String containing the molecular block data
        sanitize (bool, optional): Whether to sanitize the molecule during parsing. Defaults to True.
        remove_hs (bool, optional): Whether to remove hydrogens from the molecule. Defaults to False.

    Returns:
        Molecule: A Molecule object parsed from the block

    Notes:
        Creates a temporary file which is automatically cleaned up by the system
    """
    temp_file_path = tempfile.mktemp()
    with open(temp_file_path, "w") as f:
        f.write(block)
    return mol_from_file(block_type, temp_file_path, sanitize=sanitize, remove_hs=remove_hs)

Converts a molecular block string into a molecular object.

Args

block_type : str
Type of molecular block format (e.g., 'mol', 'sdf', 'pdb')
block : str
String containing the molecular block data
sanitize : bool, optional
Whether to sanitize the molecule during parsing. Defaults to True.
remove_hs : bool, optional
Whether to remove hydrogens from the molecule. Defaults to False.

Returns

Molecule
A Molecule object parsed from the block

Notes

Creates a temporary file which is automatically cleaned up by the system

def mol_from_file(file_type, file_path, sanitize=True, remove_hs=False)
Expand source code
def mol_from_file(file_type, file_path, sanitize=True, remove_hs=False):
    """
    Reads a molecular structure file and converts it into a Molecule object.

    This function supports various file formats and can either directly read RDKit-supported formats
    or convert other formats to MOL format before reading.

    Args:
        file_type (str): The type/format of the input file (e.g., 'mol', 'mol2', 'pdb', 'xyz', 'sdf')
        file_path (str): The path to the molecular structure file
        sanitize (bool, optional): Whether to sanitize the molecule during reading. Defaults to True
        remove_hs (bool, optional): Whether to remove hydrogens from the molecule. Defaults to False

    Returns:
        Molecule: A Molecule object containing the molecular structure

    Raises:
        ValueError: If the file format is invalid, file path is incorrect, or molecule sanitization fails

    Notes:
        - For RDKit-supported formats (MOL, MOL2, PDB, XYZ, SDF), the file is read directly
        - For other formats, the file is first converted to MOL format using a temporary file
        - The XYZ format requires special handling for sanitization and hydrogen removal
    """
    if file_type in RDKIT_SUPPORTED_INPUT_TYPES:
        mol_rdk = None

        if file_type == FileFormat.MOL.value:
            mol_rdk = Chem.MolFromMolFile(file_path, sanitize, remove_hs)
        elif file_type == FileFormat.MOL2.value:
            mol_rdk = Chem.MolFromMol2File(file_path, sanitize, remove_hs)
        elif file_type == FileFormat.PDB.value:
            mol_rdk = Chem.MolFromPDBFile(file_path, sanitize, remove_hs)
        elif file_type == FileFormat.XYZ.value:
            mol_rdk = Chem.MolFromXYZFile(file_path)
            if sanitize:
                Chem.SanitizeMol(mol_rdk)
            if remove_hs:
                mol_rdk = Chem.RemoveHs(mol_rdk)
        elif file_type == FileFormat.SDF.value:
            mol_rdk = next(iter(Chem.SDMolSupplier(file_path, sanitize, remove_hs)))

        if mol_rdk is None:
            raise ValueError("Invalid file format or file path or failed to sanitize the molecule")

        return Molecule(mol_rdk)
    else:
        temp_mol_file_path = tempfile.mktemp()
        convert_file(file_type, file_path, FileFormat.MOL.value, temp_mol_file_path)
        return mol_from_file(FileFormat.MOL.value, temp_mol_file_path, sanitize=sanitize, remove_hs=remove_hs)

Reads a molecular structure file and converts it into a Molecule object.

This function supports various file formats and can either directly read RDKit-supported formats or convert other formats to MOL format before reading.

Args

file_type : str
The type/format of the input file (e.g., 'mol', 'mol2', 'pdb', 'xyz', 'sdf')
file_path : str
The path to the molecular structure file
sanitize : bool, optional
Whether to sanitize the molecule during reading. Defaults to True
remove_hs : bool, optional
Whether to remove hydrogens from the molecule. Defaults to False

Returns

Molecule
A Molecule object containing the molecular structure

Raises

ValueError
If the file format is invalid, file path is incorrect, or molecule sanitization fails

Notes

  • For RDKit-supported formats (MOL, MOL2, PDB, XYZ, SDF), the file is read directly
  • For other formats, the file is first converted to MOL format using a temporary file
  • The XYZ format requires special handling for sanitization and hydrogen removal
def mol_from_smiles(smiles, sanitize=True)
Expand source code
def mol_from_smiles(smiles, sanitize=True):
    """
    Convert SMILES string to a Molecule object.

    Args:
        smiles (str): SMILES string representation of a molecule
        sanitize (bool, optional): Whether to sanitize the molecule. Defaults to True.

    Returns:
        Molecule: A Molecule object created from the SMILES string

    Raises:
        ValueError: If the SMILES string is invalid

    Example:
        >>> mol = mol_from_smiles("CC(=O)O")  # creates acetate molecule
    """
    mol_rdk = Chem.MolFromSmiles(smiles, sanitize=sanitize)
    if mol_rdk is None:
        raise ValueError("Invalid SMILES string")

    return Molecule(mol_rdk, add_coords=False)

Convert SMILES string to a Molecule object.

Args

smiles : str
SMILES string representation of a molecule
sanitize : bool, optional
Whether to sanitize the molecule. Defaults to True.

Returns

Molecule
A Molecule object created from the SMILES string

Raises

ValueError
If the SMILES string is invalid

Example

>>> mol = mol_from_smiles("CC(=O)O")  # creates acetate molecule

Classes

class FileFormat (value, names=None, *, module=None, qualname=None, type=None, start=1)
Expand source code
class FileFormat(Enum):
    MOL = "mol"
    MOL2 = "mol2"

    PDB = "pdb"
    PDBQT = "pdbqt"

    XYZ = "xyz"
    SDF = "sdf"

An enumeration.

Ancestors

  • enum.Enum

Class variables

var MOL
var MOL2
var PDB
var PDBQT
var SDF
var XYZ
class Molecule (mol_rdk, name=None, add_coords=False, seed=None)
Expand source code
class Molecule:
    def __init__(self, mol_rdk, name=None, add_coords=False, seed=None):
        self.m = self.process_mol(mol_rdk)
        self.n = mol_rdk.GetNumAtoms()
        self.formula = rdMolDescriptors.CalcMolFormula(self.m)
        self.smiles = Chem.MolToSmiles(Chem.RemoveHs(self.m), canonical=True)

        self.name = name
        self.contains_boron = any(atom.GetSymbol() == "B" for atom in self.m.GetAtoms())

        if self.name is not None:
            self.m.SetProp("_Name", name)

        if not self.m.GetConformers():
            AllChem.Compute2DCoords(self.m)

        self.set_conformer_id()
        if add_coords:
            seed = random.randint(0, 2**16 - 1) if seed is None else seed
            self.embed(add_hs=False, seed=seed)

        self.m.SetProp("initial_smiles", Chem.MolToSmiles(Chem.RemoveHs(mol_rdk)))

    def process_mol(self, mol):
        """
        Process a molecular structure by removing salts and kekulizing.

        This method takes an RDKit molecule, removes salt components, and performs
        kekulization to obtain a standardized molecular representation.

        Args:
            mol (rdkit.Chem.rdchem.Mol): Input RDKit molecule object.

        Returns:
            rdkit.Chem.rdchem.Mol: Processed molecule with salts removed and kekulized structure.

        Raises:
            ValueError: If salt removal fails or kekulization cannot be performed.
        """
        remover = SaltRemover.SaltRemover()

        stripped_mol = remover.StripMol(mol)
        if stripped_mol is None:
            raise ValueError("Salt removal failed.")

        try:
            Chem.Kekulize(stripped_mol, clearAromaticFlags=False)
        except Chem.KekulizeException:
            raise ValueError("Kekulization failed.")

        return stripped_mol

    def molblock(self):
        """
        Returns the MDL MOL block string representation of the molecule.

        This method converts the internal RDKit molecule object to a MOL block format,
        which is a widely used text-based chemical file format that describes the
        structure of a molecule.

        Returns:
            str: A string containing the MOL block representation of the molecule.
                The MOL block includes atomic coordinates, bonds, and other structural
                information in a standardized format.

        Note:
            This method uses RDKit's MolToMolBlock function for the conversion.
        """
        return Chem.MolToMolBlock(self.m)

    def species(self):
        """
        Returns a list of atomic symbols for all atoms in the molecule.

        Returns:
            list of str: A list of atomic symbols (e.g. ['C', 'H', 'O', ...])
        """

        return [a.GetSymbol() for a in self.m.GetAtoms()]

    def coords(self, i: int = 0):
        """
        Get the atomic coordinates for a specified conformer.

        Args:
            i (int, optional): The index of the conformer. Defaults to 0.

        Returns:
            numpy.ndarray: Array of atomic coordinates (N x 3) where N is number of atoms.
        """
        conf = self.conformer(i)
        return conf.GetPositions()

    @classmethod
    def from_smiles_or_name(cls, smiles=None, name=None, add_coords=False, seed=None):
        """
        Creates a molecular structure from either a SMILES string or compound name.

        Args:
            smiles (str, optional): SMILES representation of the molecule
            name (str, optional): Common name or identifier of the compound
            add_coords (bool, optional): Whether to add 3D coordinates. Defaults to False.
            seed (int, optional): Random seed for coordinate generation

        Returns:
            Molecule: Instance of Molecule class

        Raises:
            ValueError: If no compound is found for the given name
            AssertionError: If both SMILES and name are None

        Notes:
            If name is provided but SMILES is None, the function will attempt to fetch
            the compound information and extract its SMILES representation.
        """
        if smiles is None and name is not None:
            compounds = get_compounds(name, "name")
            if not compounds:
                raise ValueError(f"No compound found for identifier: {name}")

            compound = compounds[0]
            smiles = compound.isomeric_smiles

        assert smiles is not None

        mol_rdk = Chem.MolFromSmiles(smiles)

        return cls(mol_rdk, name=name, add_coords=add_coords, seed=seed)

    def copy(self):
        """
        Creates a deep copy of the molecule instance.

        Returns:
            Molecule: A new Molecule object with the same structure and name as the original.
        """
        return Molecule(Chem.Mol(self.m), name=self.name)

    def _draw(self):
        """
        Draw a 2D representation of the molecular structure.

        This method generates a 2D visualization of the molecule using RDKit's drawing capabilities.
        The molecule is first copied, hydrogens are removed, and 2D coordinates are computed.
        The drawing is rendered as a PNG image and encoded in base64 format for HTML display.

        Returns:
            str: HTML img tag containing the base64-encoded PNG image of the molecule.
                 Returns empty string if drawing fails.

        Raises:
            Exception: Any error during the drawing process is caught and results in returning
                      an empty string.

        Note:
            Requires RDKit libraries: Chem, AllChem, rdMolDraw2D
        """
        try:
            mol = self.copy()
            m = Chem.RemoveHs(mol.m)
            AllChem.Compute2DCoords(m)

            drawer = rdMolDraw2D.MolDraw2DCairo(300, 300)
            drawer.DrawMolecule(m)
            drawer.FinishDrawing()

            b64_encoded_png = base64.b64encode(drawer.GetDrawingText())
            html_img = '<img src="data:image/png;base64,' + b64_encoded_png.decode("utf-8") + '">'

            return html_img
        except Exception as e:
            return ""

    def draw(self):
        """
        Creates a 2D coordinate representation of the molecule for visualization.
        This method attempts to compute 2D coordinates for the molecule structure
        and removes hydrogen atoms for cleaner visualization. If computation fails,
        returns the original molecule object.

        Returns:
            rdkit.Chem.rdchem.Mol: A molecule object with 2D coordinates computed and
            hydrogens removed. If computation fails, returns original molecule object.

        Raises:
            Exception: Catches any exceptions during 2D coordinate computation and
            returns original molecule as fallback.
        """
        try:
            mol = self.copy()
            AllChem.Compute2DCoords(mol.m)

            return Chem.RemoveHs(mol.m)
        except Exception as e:
            return self.m

    def conformer(self, i=0):
        """
        Returns a specific conformer from the molecule.

        Args:
            i (int, optional): The index of the conformer to retrieve. Defaults to 0.

        Returns:
            rdkit.Chem.rdchem.Conformer: The conformer object at the specified index.

        Raises:
            ValueError: If the conformer index is out of range.
        """
        return self.m.GetConformer(i)

    def conformer_id(self):
        """
        Gets the ID of the current conformer of the molecule.

        Args:
            None

        Returns:
            int: The unique identifier of the current conformer.
        """
        return self.m.GetConformer().GetId()

    def set_conformer_id(self, i=0):
        """
        Sets the ID of the current conformer in the molecule.

        Args:
            i (int, optional): The ID to set for the conformer. Defaults to 0.

        Returns:
            None
        """
        self.m.GetConformer().SetId(i)

    def embed(self, add_hs=True, seed=-1):
        """
        Generate 3D coordinates for the molecule using distance geometry and optimize them.

        Args:
            add_hs (bool, optional): Whether to add hydrogens before embedding. Defaults to True.
            seed (int, optional): Random seed for reproducibility of the embedding. Defaults to -1.
                    If -1, a random seed will be used.

        Returns:
            None: The molecule is modified in place with 3D coordinates added to a conformer.
             If embedding fails, the molecule will remain unchanged.
        """
        if add_hs:
            self.add_hydrogens()

        AllChem.EmbedMolecule(self.m, randomSeed=seed)
        self.set_conformer_id(0)

    def add_hydrogens(self, add_coords=True):
        """
        Add hydrogen atoms to the molecule.

        Args:
            add_coords (bool, optional): If True, generates 3D coordinates for added hydrogens.
            Defaults to True.

        Returns:
            None: The molecule is modified in-place with added hydrogens.

        Notes:
            - Hydrogens are added according to standard valence rules
            - Modifies the molecule structure directly
        """
        self.m = Chem.AddHs(self.m, addCoords=add_coords)

    def assign_bond_order_from_smiles(self, smiles):
        """
        Assigns bond orders to the molecule based on a SMILES template.

        Args:
            smiles (str): SMILES string representing the template molecule structure

        Raises:
            Exception: If bond order assignment fails even after removing hydrogens

        Notes:
            The method updates both the molecule (self.m) and stores the SMILES string (self.smiles)
            internally after successful assignment.
        """
        template = Chem.MolFromSmiles(smiles)
        try:
            self.m = AllChem.AssignBondOrdersFromTemplate(template, self.m)
        except Exception:
            self.m = Chem.RemoveHs(self.m)
            self.m = AllChem.AssignBondOrdersFromTemplate(template, self.m)

        self.smiles = smiles

Static methods

def from_smiles_or_name(smiles=None, name=None, add_coords=False, seed=None)

Creates a molecular structure from either a SMILES string or compound name.

Args

smiles : str, optional
SMILES representation of the molecule
name : str, optional
Common name or identifier of the compound
add_coords : bool, optional
Whether to add 3D coordinates. Defaults to False.
seed : int, optional
Random seed for coordinate generation

Returns

Molecule
Instance of Molecule class

Raises

ValueError
If no compound is found for the given name
AssertionError
If both SMILES and name are None

Notes

If name is provided but SMILES is None, the function will attempt to fetch the compound information and extract its SMILES representation.

Methods

def add_hydrogens(self, add_coords=True)
Expand source code
def add_hydrogens(self, add_coords=True):
    """
    Add hydrogen atoms to the molecule.

    Args:
        add_coords (bool, optional): If True, generates 3D coordinates for added hydrogens.
        Defaults to True.

    Returns:
        None: The molecule is modified in-place with added hydrogens.

    Notes:
        - Hydrogens are added according to standard valence rules
        - Modifies the molecule structure directly
    """
    self.m = Chem.AddHs(self.m, addCoords=add_coords)

Add hydrogen atoms to the molecule.

Args

add_coords : bool, optional
If True, generates 3D coordinates for added hydrogens.

Defaults to True.

Returns

None
The molecule is modified in-place with added hydrogens.

Notes

  • Hydrogens are added according to standard valence rules
  • Modifies the molecule structure directly
def assign_bond_order_from_smiles(self, smiles)
Expand source code
def assign_bond_order_from_smiles(self, smiles):
    """
    Assigns bond orders to the molecule based on a SMILES template.

    Args:
        smiles (str): SMILES string representing the template molecule structure

    Raises:
        Exception: If bond order assignment fails even after removing hydrogens

    Notes:
        The method updates both the molecule (self.m) and stores the SMILES string (self.smiles)
        internally after successful assignment.
    """
    template = Chem.MolFromSmiles(smiles)
    try:
        self.m = AllChem.AssignBondOrdersFromTemplate(template, self.m)
    except Exception:
        self.m = Chem.RemoveHs(self.m)
        self.m = AllChem.AssignBondOrdersFromTemplate(template, self.m)

    self.smiles = smiles

Assigns bond orders to the molecule based on a SMILES template.

Args

smiles : str
SMILES string representing the template molecule structure

Raises

Exception
If bond order assignment fails even after removing hydrogens

Notes

The method updates both the molecule (self.m) and stores the SMILES string (self.smiles) internally after successful assignment.

def conformer(self, i=0)
Expand source code
def conformer(self, i=0):
    """
    Returns a specific conformer from the molecule.

    Args:
        i (int, optional): The index of the conformer to retrieve. Defaults to 0.

    Returns:
        rdkit.Chem.rdchem.Conformer: The conformer object at the specified index.

    Raises:
        ValueError: If the conformer index is out of range.
    """
    return self.m.GetConformer(i)

Returns a specific conformer from the molecule.

Args

i : int, optional
The index of the conformer to retrieve. Defaults to 0.

Returns

rdkit.Chem.rdchem.Conformer
The conformer object at the specified index.

Raises

ValueError
If the conformer index is out of range.
def conformer_id(self)
Expand source code
def conformer_id(self):
    """
    Gets the ID of the current conformer of the molecule.

    Args:
        None

    Returns:
        int: The unique identifier of the current conformer.
    """
    return self.m.GetConformer().GetId()

Gets the ID of the current conformer of the molecule.

Args

None

Returns

int
The unique identifier of the current conformer.
def coords(self, i: int = 0)
Expand source code
def coords(self, i: int = 0):
    """
    Get the atomic coordinates for a specified conformer.

    Args:
        i (int, optional): The index of the conformer. Defaults to 0.

    Returns:
        numpy.ndarray: Array of atomic coordinates (N x 3) where N is number of atoms.
    """
    conf = self.conformer(i)
    return conf.GetPositions()

Get the atomic coordinates for a specified conformer.

Args

i : int, optional
The index of the conformer. Defaults to 0.

Returns

numpy.ndarray
Array of atomic coordinates (N x 3) where N is number of atoms.
def copy(self)
Expand source code
def copy(self):
    """
    Creates a deep copy of the molecule instance.

    Returns:
        Molecule: A new Molecule object with the same structure and name as the original.
    """
    return Molecule(Chem.Mol(self.m), name=self.name)

Creates a deep copy of the molecule instance.

Returns

Molecule
A new Molecule object with the same structure and name as the original.
def draw(self)
Expand source code
def draw(self):
    """
    Creates a 2D coordinate representation of the molecule for visualization.
    This method attempts to compute 2D coordinates for the molecule structure
    and removes hydrogen atoms for cleaner visualization. If computation fails,
    returns the original molecule object.

    Returns:
        rdkit.Chem.rdchem.Mol: A molecule object with 2D coordinates computed and
        hydrogens removed. If computation fails, returns original molecule object.

    Raises:
        Exception: Catches any exceptions during 2D coordinate computation and
        returns original molecule as fallback.
    """
    try:
        mol = self.copy()
        AllChem.Compute2DCoords(mol.m)

        return Chem.RemoveHs(mol.m)
    except Exception as e:
        return self.m

Creates a 2D coordinate representation of the molecule for visualization. This method attempts to compute 2D coordinates for the molecule structure and removes hydrogen atoms for cleaner visualization. If computation fails, returns the original molecule object.

Returns

rdkit.Chem.rdchem.Mol
A molecule object with 2D coordinates computed and

hydrogens removed. If computation fails, returns original molecule object.

Raises

Exception
Catches any exceptions during 2D coordinate computation and

returns original molecule as fallback.

def embed(self, add_hs=True, seed=-1)
Expand source code
def embed(self, add_hs=True, seed=-1):
    """
    Generate 3D coordinates for the molecule using distance geometry and optimize them.

    Args:
        add_hs (bool, optional): Whether to add hydrogens before embedding. Defaults to True.
        seed (int, optional): Random seed for reproducibility of the embedding. Defaults to -1.
                If -1, a random seed will be used.

    Returns:
        None: The molecule is modified in place with 3D coordinates added to a conformer.
         If embedding fails, the molecule will remain unchanged.
    """
    if add_hs:
        self.add_hydrogens()

    AllChem.EmbedMolecule(self.m, randomSeed=seed)
    self.set_conformer_id(0)

Generate 3D coordinates for the molecule using distance geometry and optimize them.

Args

add_hs : bool, optional
Whether to add hydrogens before embedding. Defaults to True.
seed : int, optional
Random seed for reproducibility of the embedding. Defaults to -1. If -1, a random seed will be used.

Returns

None
The molecule is modified in place with 3D coordinates added to a conformer.

If embedding fails, the molecule will remain unchanged.

def molblock(self)
Expand source code
def molblock(self):
    """
    Returns the MDL MOL block string representation of the molecule.

    This method converts the internal RDKit molecule object to a MOL block format,
    which is a widely used text-based chemical file format that describes the
    structure of a molecule.

    Returns:
        str: A string containing the MOL block representation of the molecule.
            The MOL block includes atomic coordinates, bonds, and other structural
            information in a standardized format.

    Note:
        This method uses RDKit's MolToMolBlock function for the conversion.
    """
    return Chem.MolToMolBlock(self.m)

Returns the MDL MOL block string representation of the molecule.

This method converts the internal RDKit molecule object to a MOL block format, which is a widely used text-based chemical file format that describes the structure of a molecule.

Returns

str
A string containing the MOL block representation of the molecule. The MOL block includes atomic coordinates, bonds, and other structural information in a standardized format.

Note

This method uses RDKit's MolToMolBlock function for the conversion.

def process_mol(self, mol)
Expand source code
def process_mol(self, mol):
    """
    Process a molecular structure by removing salts and kekulizing.

    This method takes an RDKit molecule, removes salt components, and performs
    kekulization to obtain a standardized molecular representation.

    Args:
        mol (rdkit.Chem.rdchem.Mol): Input RDKit molecule object.

    Returns:
        rdkit.Chem.rdchem.Mol: Processed molecule with salts removed and kekulized structure.

    Raises:
        ValueError: If salt removal fails or kekulization cannot be performed.
    """
    remover = SaltRemover.SaltRemover()

    stripped_mol = remover.StripMol(mol)
    if stripped_mol is None:
        raise ValueError("Salt removal failed.")

    try:
        Chem.Kekulize(stripped_mol, clearAromaticFlags=False)
    except Chem.KekulizeException:
        raise ValueError("Kekulization failed.")

    return stripped_mol

Process a molecular structure by removing salts and kekulizing.

This method takes an RDKit molecule, removes salt components, and performs kekulization to obtain a standardized molecular representation.

Args

mol : rdkit.Chem.rdchem.Mol
Input RDKit molecule object.

Returns

rdkit.Chem.rdchem.Mol
Processed molecule with salts removed and kekulized structure.

Raises

ValueError
If salt removal fails or kekulization cannot be performed.
def set_conformer_id(self, i=0)
Expand source code
def set_conformer_id(self, i=0):
    """
    Sets the ID of the current conformer in the molecule.

    Args:
        i (int, optional): The ID to set for the conformer. Defaults to 0.

    Returns:
        None
    """
    self.m.GetConformer().SetId(i)

Sets the ID of the current conformer in the molecule.

Args

i : int, optional
The ID to set for the conformer. Defaults to 0.

Returns

None

def species(self)
Expand source code
def species(self):
    """
    Returns a list of atomic symbols for all atoms in the molecule.

    Returns:
        list of str: A list of atomic symbols (e.g. ['C', 'H', 'O', ...])
    """

    return [a.GetSymbol() for a in self.m.GetAtoms()]

Returns a list of atomic symbols for all atoms in the molecule.

Returns

list of str
A list of atomic symbols (e.g. ['C', 'H', 'O', …])