Module deeporigin.src.structures.internal_structures
Functions
def mol_from_block(block_type, block, sanitize=True, remove_hs=False)
-
Expand source code
def mol_from_block(block_type, block, sanitize=True, remove_hs=False): """ Converts a molecular block string into a molecular object. Args: block_type (str): Type of molecular block format (e.g., 'mol', 'sdf', 'pdb') block (str): String containing the molecular block data sanitize (bool, optional): Whether to sanitize the molecule during parsing. Defaults to True. remove_hs (bool, optional): Whether to remove hydrogens from the molecule. Defaults to False. Returns: Molecule: A Molecule object parsed from the block Notes: Creates a temporary file which is automatically cleaned up by the system """ temp_file_path = tempfile.mktemp() with open(temp_file_path, "w") as f: f.write(block) return mol_from_file(block_type, temp_file_path, sanitize=sanitize, remove_hs=remove_hs)
Converts a molecular block string into a molecular object.
Args
block_type
:str
- Type of molecular block format (e.g., 'mol', 'sdf', 'pdb')
block
:str
- String containing the molecular block data
sanitize
:bool
, optional- Whether to sanitize the molecule during parsing. Defaults to True.
remove_hs
:bool
, optional- Whether to remove hydrogens from the molecule. Defaults to False.
Returns
Molecule
- A Molecule object parsed from the block
Notes
Creates a temporary file which is automatically cleaned up by the system
def mol_from_file(file_type, file_path, sanitize=True, remove_hs=False)
-
Expand source code
def mol_from_file(file_type, file_path, sanitize=True, remove_hs=False): """ Reads a molecular structure file and converts it into a Molecule object. This function supports various file formats and can either directly read RDKit-supported formats or convert other formats to MOL format before reading. Args: file_type (str): The type/format of the input file (e.g., 'mol', 'mol2', 'pdb', 'xyz', 'sdf') file_path (str): The path to the molecular structure file sanitize (bool, optional): Whether to sanitize the molecule during reading. Defaults to True remove_hs (bool, optional): Whether to remove hydrogens from the molecule. Defaults to False Returns: Molecule: A Molecule object containing the molecular structure Raises: ValueError: If the file format is invalid, file path is incorrect, or molecule sanitization fails Notes: - For RDKit-supported formats (MOL, MOL2, PDB, XYZ, SDF), the file is read directly - For other formats, the file is first converted to MOL format using a temporary file - The XYZ format requires special handling for sanitization and hydrogen removal """ if file_type in RDKIT_SUPPORTED_INPUT_TYPES: mol_rdk = None if file_type == FileFormat.MOL.value: mol_rdk = Chem.MolFromMolFile(file_path, sanitize, remove_hs) elif file_type == FileFormat.MOL2.value: mol_rdk = Chem.MolFromMol2File(file_path, sanitize, remove_hs) elif file_type == FileFormat.PDB.value: mol_rdk = Chem.MolFromPDBFile(file_path, sanitize, remove_hs) elif file_type == FileFormat.XYZ.value: mol_rdk = Chem.MolFromXYZFile(file_path) if sanitize: Chem.SanitizeMol(mol_rdk) if remove_hs: mol_rdk = Chem.RemoveHs(mol_rdk) elif file_type == FileFormat.SDF.value: mol_rdk = next(iter(Chem.SDMolSupplier(file_path, sanitize, remove_hs))) if mol_rdk is None: raise ValueError("Invalid file format or file path or failed to sanitize the molecule") return Molecule(mol_rdk) else: temp_mol_file_path = tempfile.mktemp() convert_file(file_type, file_path, FileFormat.MOL.value, temp_mol_file_path) return mol_from_file(FileFormat.MOL.value, temp_mol_file_path, sanitize=sanitize, remove_hs=remove_hs)
Reads a molecular structure file and converts it into a Molecule object.
This function supports various file formats and can either directly read RDKit-supported formats or convert other formats to MOL format before reading.
Args
file_type
:str
- The type/format of the input file (e.g., 'mol', 'mol2', 'pdb', 'xyz', 'sdf')
file_path
:str
- The path to the molecular structure file
sanitize
:bool
, optional- Whether to sanitize the molecule during reading. Defaults to True
remove_hs
:bool
, optional- Whether to remove hydrogens from the molecule. Defaults to False
Returns
Molecule
- A Molecule object containing the molecular structure
Raises
ValueError
- If the file format is invalid, file path is incorrect, or molecule sanitization fails
Notes
- For RDKit-supported formats (MOL, MOL2, PDB, XYZ, SDF), the file is read directly
- For other formats, the file is first converted to MOL format using a temporary file
- The XYZ format requires special handling for sanitization and hydrogen removal
def mol_from_smiles(smiles, sanitize=True)
-
Expand source code
def mol_from_smiles(smiles, sanitize=True): """ Convert SMILES string to a Molecule object. Args: smiles (str): SMILES string representation of a molecule sanitize (bool, optional): Whether to sanitize the molecule. Defaults to True. Returns: Molecule: A Molecule object created from the SMILES string Raises: ValueError: If the SMILES string is invalid Example: >>> mol = mol_from_smiles("CC(=O)O") # creates acetate molecule """ mol_rdk = Chem.MolFromSmiles(smiles, sanitize=sanitize) if mol_rdk is None: raise ValueError("Invalid SMILES string") return Molecule(mol_rdk, add_coords=False)
Convert SMILES string to a Molecule object.
Args
smiles
:str
- SMILES string representation of a molecule
sanitize
:bool
, optional- Whether to sanitize the molecule. Defaults to True.
Returns
Molecule
- A Molecule object created from the SMILES string
Raises
ValueError
- If the SMILES string is invalid
Example
>>> mol = mol_from_smiles("CC(=O)O") # creates acetate molecule
Classes
class FileFormat (value, names=None, *, module=None, qualname=None, type=None, start=1)
-
Expand source code
class FileFormat(Enum): MOL = "mol" MOL2 = "mol2" PDB = "pdb" PDBQT = "pdbqt" XYZ = "xyz" SDF = "sdf"
An enumeration.
Ancestors
- enum.Enum
Class variables
var MOL
var MOL2
var PDB
var PDBQT
var SDF
var XYZ
class Molecule (mol_rdk, name=None, add_coords=False, seed=None)
-
Expand source code
class Molecule: def __init__(self, mol_rdk, name=None, add_coords=False, seed=None): self.m = self.process_mol(mol_rdk) self.n = mol_rdk.GetNumAtoms() self.formula = rdMolDescriptors.CalcMolFormula(self.m) self.smiles = Chem.MolToSmiles(Chem.RemoveHs(self.m), canonical=True) self.name = name self.contains_boron = any(atom.GetSymbol() == "B" for atom in self.m.GetAtoms()) if self.name is not None: self.m.SetProp("_Name", name) if not self.m.GetConformers(): AllChem.Compute2DCoords(self.m) self.set_conformer_id() if add_coords: seed = random.randint(0, 2**16 - 1) if seed is None else seed self.embed(add_hs=False, seed=seed) self.m.SetProp("initial_smiles", Chem.MolToSmiles(Chem.RemoveHs(mol_rdk))) def process_mol(self, mol): """ Process a molecular structure by removing salts and kekulizing. This method takes an RDKit molecule, removes salt components, and performs kekulization to obtain a standardized molecular representation. Args: mol (rdkit.Chem.rdchem.Mol): Input RDKit molecule object. Returns: rdkit.Chem.rdchem.Mol: Processed molecule with salts removed and kekulized structure. Raises: ValueError: If salt removal fails or kekulization cannot be performed. """ remover = SaltRemover.SaltRemover() stripped_mol = remover.StripMol(mol) if stripped_mol is None: raise ValueError("Salt removal failed.") try: Chem.Kekulize(stripped_mol, clearAromaticFlags=False) except Chem.KekulizeException: raise ValueError("Kekulization failed.") return stripped_mol def molblock(self): """ Returns the MDL MOL block string representation of the molecule. This method converts the internal RDKit molecule object to a MOL block format, which is a widely used text-based chemical file format that describes the structure of a molecule. Returns: str: A string containing the MOL block representation of the molecule. The MOL block includes atomic coordinates, bonds, and other structural information in a standardized format. Note: This method uses RDKit's MolToMolBlock function for the conversion. """ return Chem.MolToMolBlock(self.m) def species(self): """ Returns a list of atomic symbols for all atoms in the molecule. Returns: list of str: A list of atomic symbols (e.g. ['C', 'H', 'O', ...]) """ return [a.GetSymbol() for a in self.m.GetAtoms()] def coords(self, i: int = 0): """ Get the atomic coordinates for a specified conformer. Args: i (int, optional): The index of the conformer. Defaults to 0. Returns: numpy.ndarray: Array of atomic coordinates (N x 3) where N is number of atoms. """ conf = self.conformer(i) return conf.GetPositions() @classmethod def from_smiles_or_name(cls, smiles=None, name=None, add_coords=False, seed=None): """ Creates a molecular structure from either a SMILES string or compound name. Args: smiles (str, optional): SMILES representation of the molecule name (str, optional): Common name or identifier of the compound add_coords (bool, optional): Whether to add 3D coordinates. Defaults to False. seed (int, optional): Random seed for coordinate generation Returns: Molecule: Instance of Molecule class Raises: ValueError: If no compound is found for the given name AssertionError: If both SMILES and name are None Notes: If name is provided but SMILES is None, the function will attempt to fetch the compound information and extract its SMILES representation. """ if smiles is None and name is not None: compounds = get_compounds(name, "name") if not compounds: raise ValueError(f"No compound found for identifier: {name}") compound = compounds[0] smiles = compound.isomeric_smiles assert smiles is not None mol_rdk = Chem.MolFromSmiles(smiles) return cls(mol_rdk, name=name, add_coords=add_coords, seed=seed) def copy(self): """ Creates a deep copy of the molecule instance. Returns: Molecule: A new Molecule object with the same structure and name as the original. """ return Molecule(Chem.Mol(self.m), name=self.name) def _draw(self): """ Draw a 2D representation of the molecular structure. This method generates a 2D visualization of the molecule using RDKit's drawing capabilities. The molecule is first copied, hydrogens are removed, and 2D coordinates are computed. The drawing is rendered as a PNG image and encoded in base64 format for HTML display. Returns: str: HTML img tag containing the base64-encoded PNG image of the molecule. Returns empty string if drawing fails. Raises: Exception: Any error during the drawing process is caught and results in returning an empty string. Note: Requires RDKit libraries: Chem, AllChem, rdMolDraw2D """ try: mol = self.copy() m = Chem.RemoveHs(mol.m) AllChem.Compute2DCoords(m) drawer = rdMolDraw2D.MolDraw2DCairo(300, 300) drawer.DrawMolecule(m) drawer.FinishDrawing() b64_encoded_png = base64.b64encode(drawer.GetDrawingText()) html_img = '<img src="data:image/png;base64,' + b64_encoded_png.decode("utf-8") + '">' return html_img except Exception as e: return "" def draw(self): """ Creates a 2D coordinate representation of the molecule for visualization. This method attempts to compute 2D coordinates for the molecule structure and removes hydrogen atoms for cleaner visualization. If computation fails, returns the original molecule object. Returns: rdkit.Chem.rdchem.Mol: A molecule object with 2D coordinates computed and hydrogens removed. If computation fails, returns original molecule object. Raises: Exception: Catches any exceptions during 2D coordinate computation and returns original molecule as fallback. """ try: mol = self.copy() AllChem.Compute2DCoords(mol.m) return Chem.RemoveHs(mol.m) except Exception as e: return self.m def conformer(self, i=0): """ Returns a specific conformer from the molecule. Args: i (int, optional): The index of the conformer to retrieve. Defaults to 0. Returns: rdkit.Chem.rdchem.Conformer: The conformer object at the specified index. Raises: ValueError: If the conformer index is out of range. """ return self.m.GetConformer(i) def conformer_id(self): """ Gets the ID of the current conformer of the molecule. Args: None Returns: int: The unique identifier of the current conformer. """ return self.m.GetConformer().GetId() def set_conformer_id(self, i=0): """ Sets the ID of the current conformer in the molecule. Args: i (int, optional): The ID to set for the conformer. Defaults to 0. Returns: None """ self.m.GetConformer().SetId(i) def embed(self, add_hs=True, seed=-1): """ Generate 3D coordinates for the molecule using distance geometry and optimize them. Args: add_hs (bool, optional): Whether to add hydrogens before embedding. Defaults to True. seed (int, optional): Random seed for reproducibility of the embedding. Defaults to -1. If -1, a random seed will be used. Returns: None: The molecule is modified in place with 3D coordinates added to a conformer. If embedding fails, the molecule will remain unchanged. """ if add_hs: self.add_hydrogens() AllChem.EmbedMolecule(self.m, randomSeed=seed) self.set_conformer_id(0) def add_hydrogens(self, add_coords=True): """ Add hydrogen atoms to the molecule. Args: add_coords (bool, optional): If True, generates 3D coordinates for added hydrogens. Defaults to True. Returns: None: The molecule is modified in-place with added hydrogens. Notes: - Hydrogens are added according to standard valence rules - Modifies the molecule structure directly """ self.m = Chem.AddHs(self.m, addCoords=add_coords) def assign_bond_order_from_smiles(self, smiles): """ Assigns bond orders to the molecule based on a SMILES template. Args: smiles (str): SMILES string representing the template molecule structure Raises: Exception: If bond order assignment fails even after removing hydrogens Notes: The method updates both the molecule (self.m) and stores the SMILES string (self.smiles) internally after successful assignment. """ template = Chem.MolFromSmiles(smiles) try: self.m = AllChem.AssignBondOrdersFromTemplate(template, self.m) except Exception: self.m = Chem.RemoveHs(self.m) self.m = AllChem.AssignBondOrdersFromTemplate(template, self.m) self.smiles = smiles
Static methods
def from_smiles_or_name(smiles=None, name=None, add_coords=False, seed=None)
-
Creates a molecular structure from either a SMILES string or compound name.
Args
smiles
:str
, optional- SMILES representation of the molecule
name
:str
, optional- Common name or identifier of the compound
add_coords
:bool
, optional- Whether to add 3D coordinates. Defaults to False.
seed
:int
, optional- Random seed for coordinate generation
Returns
Molecule
- Instance of Molecule class
Raises
ValueError
- If no compound is found for the given name
AssertionError
- If both SMILES and name are None
Notes
If name is provided but SMILES is None, the function will attempt to fetch the compound information and extract its SMILES representation.
Methods
def add_hydrogens(self, add_coords=True)
-
Expand source code
def add_hydrogens(self, add_coords=True): """ Add hydrogen atoms to the molecule. Args: add_coords (bool, optional): If True, generates 3D coordinates for added hydrogens. Defaults to True. Returns: None: The molecule is modified in-place with added hydrogens. Notes: - Hydrogens are added according to standard valence rules - Modifies the molecule structure directly """ self.m = Chem.AddHs(self.m, addCoords=add_coords)
Add hydrogen atoms to the molecule.
Args
add_coords
:bool
, optional- If True, generates 3D coordinates for added hydrogens.
Defaults to True.
Returns
None
- The molecule is modified in-place with added hydrogens.
Notes
- Hydrogens are added according to standard valence rules
- Modifies the molecule structure directly
def assign_bond_order_from_smiles(self, smiles)
-
Expand source code
def assign_bond_order_from_smiles(self, smiles): """ Assigns bond orders to the molecule based on a SMILES template. Args: smiles (str): SMILES string representing the template molecule structure Raises: Exception: If bond order assignment fails even after removing hydrogens Notes: The method updates both the molecule (self.m) and stores the SMILES string (self.smiles) internally after successful assignment. """ template = Chem.MolFromSmiles(smiles) try: self.m = AllChem.AssignBondOrdersFromTemplate(template, self.m) except Exception: self.m = Chem.RemoveHs(self.m) self.m = AllChem.AssignBondOrdersFromTemplate(template, self.m) self.smiles = smiles
Assigns bond orders to the molecule based on a SMILES template.
Args
smiles
:str
- SMILES string representing the template molecule structure
Raises
Exception
- If bond order assignment fails even after removing hydrogens
Notes
The method updates both the molecule (self.m) and stores the SMILES string (self.smiles) internally after successful assignment.
def conformer(self, i=0)
-
Expand source code
def conformer(self, i=0): """ Returns a specific conformer from the molecule. Args: i (int, optional): The index of the conformer to retrieve. Defaults to 0. Returns: rdkit.Chem.rdchem.Conformer: The conformer object at the specified index. Raises: ValueError: If the conformer index is out of range. """ return self.m.GetConformer(i)
Returns a specific conformer from the molecule.
Args
i
:int
, optional- The index of the conformer to retrieve. Defaults to 0.
Returns
rdkit.Chem.rdchem.Conformer
- The conformer object at the specified index.
Raises
ValueError
- If the conformer index is out of range.
def conformer_id(self)
-
Expand source code
def conformer_id(self): """ Gets the ID of the current conformer of the molecule. Args: None Returns: int: The unique identifier of the current conformer. """ return self.m.GetConformer().GetId()
Gets the ID of the current conformer of the molecule.
Args
None
Returns
int
- The unique identifier of the current conformer.
def coords(self, i: int = 0)
-
Expand source code
def coords(self, i: int = 0): """ Get the atomic coordinates for a specified conformer. Args: i (int, optional): The index of the conformer. Defaults to 0. Returns: numpy.ndarray: Array of atomic coordinates (N x 3) where N is number of atoms. """ conf = self.conformer(i) return conf.GetPositions()
Get the atomic coordinates for a specified conformer.
Args
i
:int
, optional- The index of the conformer. Defaults to 0.
Returns
numpy.ndarray
- Array of atomic coordinates (N x 3) where N is number of atoms.
def copy(self)
-
Expand source code
def copy(self): """ Creates a deep copy of the molecule instance. Returns: Molecule: A new Molecule object with the same structure and name as the original. """ return Molecule(Chem.Mol(self.m), name=self.name)
Creates a deep copy of the molecule instance.
Returns
Molecule
- A new Molecule object with the same structure and name as the original.
def draw(self)
-
Expand source code
def draw(self): """ Creates a 2D coordinate representation of the molecule for visualization. This method attempts to compute 2D coordinates for the molecule structure and removes hydrogen atoms for cleaner visualization. If computation fails, returns the original molecule object. Returns: rdkit.Chem.rdchem.Mol: A molecule object with 2D coordinates computed and hydrogens removed. If computation fails, returns original molecule object. Raises: Exception: Catches any exceptions during 2D coordinate computation and returns original molecule as fallback. """ try: mol = self.copy() AllChem.Compute2DCoords(mol.m) return Chem.RemoveHs(mol.m) except Exception as e: return self.m
Creates a 2D coordinate representation of the molecule for visualization. This method attempts to compute 2D coordinates for the molecule structure and removes hydrogen atoms for cleaner visualization. If computation fails, returns the original molecule object.
Returns
rdkit.Chem.rdchem.Mol
- A molecule object with 2D coordinates computed and
hydrogens removed. If computation fails, returns original molecule object.
Raises
Exception
- Catches any exceptions during 2D coordinate computation and
returns original molecule as fallback.
def embed(self, add_hs=True, seed=-1)
-
Expand source code
def embed(self, add_hs=True, seed=-1): """ Generate 3D coordinates for the molecule using distance geometry and optimize them. Args: add_hs (bool, optional): Whether to add hydrogens before embedding. Defaults to True. seed (int, optional): Random seed for reproducibility of the embedding. Defaults to -1. If -1, a random seed will be used. Returns: None: The molecule is modified in place with 3D coordinates added to a conformer. If embedding fails, the molecule will remain unchanged. """ if add_hs: self.add_hydrogens() AllChem.EmbedMolecule(self.m, randomSeed=seed) self.set_conformer_id(0)
Generate 3D coordinates for the molecule using distance geometry and optimize them.
Args
add_hs
:bool
, optional- Whether to add hydrogens before embedding. Defaults to True.
seed
:int
, optional- Random seed for reproducibility of the embedding. Defaults to -1. If -1, a random seed will be used.
Returns
None
- The molecule is modified in place with 3D coordinates added to a conformer.
If embedding fails, the molecule will remain unchanged.
def molblock(self)
-
Expand source code
def molblock(self): """ Returns the MDL MOL block string representation of the molecule. This method converts the internal RDKit molecule object to a MOL block format, which is a widely used text-based chemical file format that describes the structure of a molecule. Returns: str: A string containing the MOL block representation of the molecule. The MOL block includes atomic coordinates, bonds, and other structural information in a standardized format. Note: This method uses RDKit's MolToMolBlock function for the conversion. """ return Chem.MolToMolBlock(self.m)
Returns the MDL MOL block string representation of the molecule.
This method converts the internal RDKit molecule object to a MOL block format, which is a widely used text-based chemical file format that describes the structure of a molecule.
Returns
str
- A string containing the MOL block representation of the molecule. The MOL block includes atomic coordinates, bonds, and other structural information in a standardized format.
Note
This method uses RDKit's MolToMolBlock function for the conversion.
def process_mol(self, mol)
-
Expand source code
def process_mol(self, mol): """ Process a molecular structure by removing salts and kekulizing. This method takes an RDKit molecule, removes salt components, and performs kekulization to obtain a standardized molecular representation. Args: mol (rdkit.Chem.rdchem.Mol): Input RDKit molecule object. Returns: rdkit.Chem.rdchem.Mol: Processed molecule with salts removed and kekulized structure. Raises: ValueError: If salt removal fails or kekulization cannot be performed. """ remover = SaltRemover.SaltRemover() stripped_mol = remover.StripMol(mol) if stripped_mol is None: raise ValueError("Salt removal failed.") try: Chem.Kekulize(stripped_mol, clearAromaticFlags=False) except Chem.KekulizeException: raise ValueError("Kekulization failed.") return stripped_mol
Process a molecular structure by removing salts and kekulizing.
This method takes an RDKit molecule, removes salt components, and performs kekulization to obtain a standardized molecular representation.
Args
mol
:rdkit.Chem.rdchem.Mol
- Input RDKit molecule object.
Returns
rdkit.Chem.rdchem.Mol
- Processed molecule with salts removed and kekulized structure.
Raises
ValueError
- If salt removal fails or kekulization cannot be performed.
def set_conformer_id(self, i=0)
-
Expand source code
def set_conformer_id(self, i=0): """ Sets the ID of the current conformer in the molecule. Args: i (int, optional): The ID to set for the conformer. Defaults to 0. Returns: None """ self.m.GetConformer().SetId(i)
Sets the ID of the current conformer in the molecule.
Args
i
:int
, optional- The ID to set for the conformer. Defaults to 0.
Returns
None
def species(self)
-
Expand source code
def species(self): """ Returns a list of atomic symbols for all atoms in the molecule. Returns: list of str: A list of atomic symbols (e.g. ['C', 'H', 'O', ...]) """ return [a.GetSymbol() for a in self.m.GetAtoms()]
Returns a list of atomic symbols for all atoms in the molecule.
Returns
list
ofstr
- A list of atomic symbols (e.g. ['C', 'H', 'O', …])