Module deeporigin.src.utilities.readers
Functions
def from_pdbqt_atom_type(atom_type: str)
-
Expand source code
def from_pdbqt_atom_type(atom_type: str): """Convert PDBQT atom types to standard atomic symbols. This function converts AutoDock PDBQT atom types to their corresponding standard atomic symbols used in molecular structures. Args: atom_type (str): The PDBQT atom type to convert. Valid types include: - A, Z, G, GA, J, Q (converts to C) - HD, HS (converts to H) - NA, NS (converts to N) - OA, OS (converts to O) - SA (converts to S) Returns: str: The corresponding standard atomic symbol (C, H, N, O, S), or empty string if atom type is not recognized. Examples: >>> from_pdbqt_atom_type("HD") 'H' >>> from_pdbqt_atom_type("OA") 'O' >>> from_pdbqt_atom_type("XX") '' """ if atom_type in ["A", "Z", "G", "GA", "J", "Q"]: return "C" elif atom_type in ["HD", "HS"]: return "H" elif atom_type in ["NA", "NS"]: return "N" elif atom_type in ["OA", "OS"]: return "O" elif atom_type == "SA": return "S" else: return ""
Convert PDBQT atom types to standard atomic symbols.
This function converts AutoDock PDBQT atom types to their corresponding standard atomic symbols used in molecular structures.
Args
atom_type
:str
- The PDBQT atom type to convert. Valid types include: - A, Z, G, GA, J, Q (converts to C) - HD, HS (converts to H) - NA, NS (converts to N) - OA, OS (converts to O) - SA (converts to S)
Returns
str
- The corresponding standard atomic symbol (C, H, N, O, S), or empty string if atom type is not recognized.
Examples
>>> from_pdbqt_atom_type("HD") 'H' >>> from_pdbqt_atom_type("OA") 'O' >>> from_pdbqt_atom_type("XX") ''
def init_periodic_table()
-
Expand source code
def init_periodic_table(): """Initialize periodic table data. This function populates the PERIODIC_TABLE dictionary with atomic data for all elements up to N_PT_ELEMENTS. The data is retrieved from RDKit's Periodic Table and includes: - Atomic ID - Element symbol - Element name - Atomic weight - Van der Waals radius - Covalent radius The function performs type checking on retrieved values and logs any errors encountered during initialization. Returns: None Raises: Exception: If there is an error loading data for any element. These exceptions are caught and logged. Notes: - Requires RDKit's Chem module - Uses global PERIODIC_TABLE dictionary to store element data - N_PT_ELEMENTS constant defines the number of elements to process """ logger = Logger("INFO", os.getenv("LOG_BIOSIM_CLIENT")) pt = Chem.GetPeriodicTable() for id in range(1, N_PT_ELEMENTS + 1): try: symbol = pt.GetElementSymbol(id) name = pt.GetElementName(id) weight = pt.GetAtomicWeight(id) rvdw = pt.GetRvdw(id) rcov = pt.GetRcovalent(id) assert isinstance(symbol, str), "Symbol must be a string" assert isinstance(name, str), "Name must be a string" assert isinstance(weight, float), "Weight must be a float" assert isinstance(rvdw, float), "rvdw must be a float" assert isinstance(rcov, float), "rcov must be a float" PERIODIC_TABLE['id'].append(id) PERIODIC_TABLE['symbol'].append(symbol) PERIODIC_TABLE['name'].append(name) PERIODIC_TABLE['weight'].append(weight) PERIODIC_TABLE['rvdw'].append(rvdw) PERIODIC_TABLE['rcov'].append(rcov) except Exception as e: logger.log_error(f"Failed to load Periodic Table on atom with id: {id}. Error: {str(e)}")
Initialize periodic table data.
This function populates the PERIODIC_TABLE dictionary with atomic data for all elements up to N_PT_ELEMENTS. The data is retrieved from RDKit's Periodic Table and includes: - Atomic ID - Element symbol - Element name
- Atomic weight - Van der Waals radius - Covalent radiusThe function performs type checking on retrieved values and logs any errors encountered during initialization.
Returns
None
Raises
Exception
- If there is an error loading data for any element. These exceptions are caught and logged.
Notes
- Requires RDKit's Chem module
- Uses global PERIODIC_TABLE dictionary to store element data
- N_PT_ELEMENTS constant defines the number of elements to process
def read_block(block_type, block_content)
-
Expand source code
def read_block(block_type, block_content): """Read a molecular structure block and return its contents based on file format. This function acts as a dispatcher to specific readers based on the input file format. Args: block_type (FileFormat): The format of the molecular structure block (MOL2, PDB, PDBQT, or XYZ) block_content (str): The content of the molecular structure block to be read Returns: tuple: A tuple containing: - name (str): The molecule name/identifier - atom_types (list or numpy.ndarray): List/array of atomic symbols - coordinates (numpy.ndarray): 3xN array of atom coordinates where N is number of atoms Raises: Exception: If block_type is not one of the supported formats See Also: read_mol2_block read_pdb_pdbqt_block read_xyz_block Examples: >>> content = "@<TRIPOS>MOLECULE\\n..." >>> name, atoms, coords = read_block(FileFormat.MOL2, content) """ if block_type == FileFormat.MOL2: return read_mol2_block(block_content) elif block_type in [FileFormat.PDB, FileFormat.PDBQT]: return read_pdb_pdbqt_block(block_type, block_content) elif block_type == FileFormat.XYZ: return read_xyz_block(block_content) else: raise Exception(f"Invalid file format {block_type}")
Read a molecular structure block and return its contents based on file format.
This function acts as a dispatcher to specific readers based on the input file format.
Args
block_type
:FileFormat
- The format of the molecular structure block (MOL2, PDB, PDBQT, or XYZ)
block_content
:str
- The content of the molecular structure block to be read
Returns
tuple
- A tuple containing: - name (str): The molecule name/identifier - atom_types (list or numpy.ndarray): List/array of atomic symbols - coordinates (numpy.ndarray): 3xN array of atom coordinates where N is number of atoms
Raises
Exception
- If block_type is not one of the supported formats
See Also: read_mol2_block read_pdb_pdbqt_block
read_xyz_blockExamples
>>> content = "@<TRIPOS>MOLECULE\n..." >>> name, atoms, coords = read_block(FileFormat.MOL2, content)
def read_mol2_block(block: str)
-
Expand source code
def read_mol2_block(block: str): """Read a MOL2 block and extract molecule information. This function takes a MOL2 format block string and parses it to extract the molecule name, atom types, and 3D coordinates using RDKit. Args: block (str): A string containing molecule data in MOL2 format. Returns: tuple: A tuple containing: - name (str): The name/identifier of the molecule - atom_types (numpy.ndarray): Array of atomic symbols for each atom - coordinates (numpy.ndarray): Array of shape (n_atoms, 3) containing 3D coordinates of atoms in the molecule Note: The function preserves hydrogen atoms and skips sanitization when reading the MOL2 block. """ mol = Chem.rdmolfiles.MolFromMol2Block(block, removeHs = False, sanitize = False) name = mol.GetProp('_Name') atoms = mol.GetAtoms() atom_types = np.empty(len(atoms), dtype=str) coordinates = np.empty((3, len(atoms)), dtype=np.float32) for i, atom in enumerate(atoms): atom_types[i] = atom.GetSymbol() positions = mol.GetConformer(0).GetAtomPosition(i) coordinates[:, i] = np.array([positions.x, positions.y, positions.z], dtype=np.float32) return name, atom_types, np.transpose(coordinates)
Read a MOL2 block and extract molecule information. This function takes a MOL2 format block string and parses it to extract the molecule name, atom types, and 3D coordinates using RDKit.
Args
block
:str
- A string containing molecule data in MOL2 format.
Returns
tuple
- A tuple containing: - name (str): The name/identifier of the molecule - atom_types (numpy.ndarray): Array of atomic symbols for each atom - coordinates (numpy.ndarray): Array of shape (n_atoms, 3) containing 3D coordinates of atoms in the molecule
Note
The function preserves hydrogen atoms and skips sanitization when reading the MOL2 block.
def read_pdb_pdbqt_block(block_type, block)
-
Expand source code
def read_pdb_pdbqt_block(block_type, block): """ Read and parse PDB/PDBQT block to extract molecule information. Args: block_type (FileFormat): Format of the input block (PDB or PDBQT) block (str): Text block containing molecule data in PDB or PDBQT format Returns: tuple: A tuple containing: - name (str or None): Molecule name from COMPND field (PDB) or REMARK Name field (PDBQT) - atom_types (list): List of atom types as chemical element symbols - coordinates (np.ndarray): 3xN array of atomic coordinates (x,y,z) where N is number of atoms Notes: For PDBQT files, atom types are converted from PDBQT format to standard chemical element symbols. If atom type cannot be found in columns 70-73, it attempts to read from columns 76-79. """ lines = block.split("\n") uppercase_atom_symbols = [symbol.upper() for symbol in PERIODIC_TABLE['symbol']] if block_type == FileFormat.PDB: compnd_line_id = next((i for i, line in enumerate(lines) if line.startswith("COMPND")), None) name = lines[compnd_line_id][7:].strip() if compnd_line_id is not None else None else: name_line_id = next((i for i, line in enumerate(lines) if line.startswith("REMARK Name")), None) name = lines[name_line_id][15:].strip() if name_line_id is not None else None atom_line_ids = [i for i, line in enumerate(lines) if line.startswith("ATOM")] atom_count = len(atom_line_ids) atom_types = [None] * atom_count coordinates = np.empty((3, atom_count), dtype=np.float32) atom_lines = [lines[i] for i in atom_line_ids] for i, line in enumerate(atom_lines): coordinates[0, i] = float(line[30:38]) coordinates[1, i] = float(line[38:46]) coordinates[2, i] = float(line[46:54]) atom_type = line[70:73].strip() if atom_type not in uppercase_atom_symbols: atom_type = from_pdbqt_atom_type(atom_type) if atom_type == '': atom_type = line[76:79].strip() if atom_type not in uppercase_atom_symbols: atom_type = from_pdbqt_atom_type(atom_type) atom_types[i] = atom_type return name, atom_types, coordinates
Read and parse PDB/PDBQT block to extract molecule information.
Args
block_type
:FileFormat
- Format of the input block (PDB or PDBQT)
block
:str
- Text block containing molecule data in PDB or PDBQT format
Returns
tuple
- A tuple containing: - name (str or None): Molecule name from COMPND field (PDB) or REMARK Name field (PDBQT) - atom_types (list): List of atom types as chemical element symbols - coordinates (np.ndarray): 3xN array of atomic coordinates (x,y,z) where N is number of atoms
Notes
For PDBQT files, atom types are converted from PDBQT format to standard chemical element symbols. If atom type cannot be found in columns 70-73, it attempts to read from columns 76-79.
def read_xyz_block(block: str)
-
Expand source code
def read_xyz_block(block: str): """Reads and parses an XYZ format block of molecular coordinates. Args: block (str): A string containing the XYZ format block to be parsed. The block should follow the standard XYZ file format: - First line: number of atoms - Second line: molecule name/comment - Subsequent lines: atom_type x y z coordinates Returns: tuple: A tuple containing: - name (str): The molecule name or comment from the second line - atom_types (numpy.ndarray): 1D array containing the atomic symbols for each atom - coordinates (numpy.ndarray): 3xN array of atomic coordinates where N is the number of atoms. First dimension corresponds to x, y, z coordinates. Examples: >>> block = "2\\nH2\\nH 0.0 0.0 0.0\\nH 0.0 0.0 0.74" >>> name, atoms, coords = read_xyz_block(block) >>> print(name) 'H2' """ lines = block.split("\n") atom_count = int(lines[0]) name = lines[1].strip() atom_types = np.empty(atom_count, dtype=str) coordinates = np.empty((3, atom_count), dtype=np.float32) atom_lines = lines[2:2+atom_count] for i, line in enumerate(atom_lines): tokens = line.split() atom_types[i] = tokens[0] coordinates[0, i] = float(tokens[1]) coordinates[1, i] = float(tokens[2]) coordinates[2, i] = float(tokens[3]) return name, atom_types, coordinates
Reads and parses an XYZ format block of molecular coordinates.
Args
block
:str
- A string containing the XYZ format block to be parsed. The block should follow the standard XYZ file format: - First line: number of atoms - Second line: molecule name/comment - Subsequent lines: atom_type x y z coordinates
Returns
tuple
- A tuple containing: - name (str): The molecule name or comment from the second line - atom_types (numpy.ndarray): 1D array containing the atomic symbols for each atom - coordinates (numpy.ndarray): 3xN array of atomic coordinates where N is the number of atoms. First dimension corresponds to x, y, z coordinates.
Examples
>>> block = "2\nH2\nH 0.0 0.0 0.0\nH 0.0 0.0 0.74" >>> name, atoms, coords = read_xyz_block(block) >>> print(name) 'H2'