Module deeporigin.src.utilities.readers

Functions

def from_pdbqt_atom_type(atom_type: str)
Expand source code
def from_pdbqt_atom_type(atom_type: str):
    """Convert PDBQT atom types to standard atomic symbols.

    This function converts AutoDock PDBQT atom types to their corresponding
    standard atomic symbols used in molecular structures.

    Args:
        atom_type (str): The PDBQT atom type to convert.
            Valid types include:
            - A, Z, G, GA, J, Q (converts to C)
            - HD, HS (converts to H)
            - NA, NS (converts to N)
            - OA, OS (converts to O)
            - SA (converts to S)

    Returns:
        str: The corresponding standard atomic symbol (C, H, N, O, S),
             or empty string if atom type is not recognized.

    Examples:
        >>> from_pdbqt_atom_type("HD")
        'H'
        >>> from_pdbqt_atom_type("OA") 
        'O'
        >>> from_pdbqt_atom_type("XX")
        ''
    """
    if atom_type in ["A", "Z", "G", "GA", "J", "Q"]:
        return "C"
    elif atom_type in ["HD", "HS"]:
        return "H"
    elif atom_type in ["NA", "NS"]:
        return "N"
    elif atom_type in ["OA", "OS"]:
        return "O"
    elif atom_type == "SA":
        return "S"
    else:
        return ""

Convert PDBQT atom types to standard atomic symbols.

This function converts AutoDock PDBQT atom types to their corresponding standard atomic symbols used in molecular structures.

Args

atom_type : str
The PDBQT atom type to convert. Valid types include: - A, Z, G, GA, J, Q (converts to C) - HD, HS (converts to H) - NA, NS (converts to N) - OA, OS (converts to O) - SA (converts to S)

Returns

str
The corresponding standard atomic symbol (C, H, N, O, S), or empty string if atom type is not recognized.

Examples

>>> from_pdbqt_atom_type("HD")
'H'
>>> from_pdbqt_atom_type("OA") 
'O'
>>> from_pdbqt_atom_type("XX")
''
def init_periodic_table()
Expand source code
def init_periodic_table():
    """Initialize periodic table data.

    This function populates the PERIODIC_TABLE dictionary with atomic data for all elements up to N_PT_ELEMENTS. 
    The data is retrieved from RDKit's Periodic Table and includes:
    - Atomic ID
    - Element symbol
    - Element name  
    - Atomic weight
    - Van der Waals radius
    - Covalent radius

    The function performs type checking on retrieved values and logs any errors encountered during initialization.

    Returns:
        None

    Raises:
        Exception: If there is an error loading data for any element. These exceptions are caught and logged.

    Notes:
        - Requires RDKit's Chem module
        - Uses global PERIODIC_TABLE dictionary to store element data
        - N_PT_ELEMENTS constant defines the number of elements to process
    """
    logger = Logger("INFO", os.getenv("LOG_BIOSIM_CLIENT"))
    pt = Chem.GetPeriodicTable()

    for id in range(1, N_PT_ELEMENTS + 1):
        try:
            symbol = pt.GetElementSymbol(id)
            name = pt.GetElementName(id)
            weight = pt.GetAtomicWeight(id)
            rvdw = pt.GetRvdw(id)
            rcov = pt.GetRcovalent(id)

            assert isinstance(symbol, str), "Symbol must be a string"
            assert isinstance(name, str), "Name must be a string"
            assert isinstance(weight, float), "Weight must be a float"
            assert isinstance(rvdw, float), "rvdw must be a float"
            assert isinstance(rcov, float), "rcov must be a float"

            PERIODIC_TABLE['id'].append(id)
            PERIODIC_TABLE['symbol'].append(symbol)
            PERIODIC_TABLE['name'].append(name)
            PERIODIC_TABLE['weight'].append(weight)
            PERIODIC_TABLE['rvdw'].append(rvdw)
            PERIODIC_TABLE['rcov'].append(rcov)
        except Exception as e:
            logger.log_error(f"Failed to load Periodic Table on atom with id: {id}. Error: {str(e)}")

Initialize periodic table data.

This function populates the PERIODIC_TABLE dictionary with atomic data for all elements up to N_PT_ELEMENTS. The data is retrieved from RDKit's Periodic Table and includes: - Atomic ID - Element symbol - Element name
- Atomic weight - Van der Waals radius - Covalent radius

The function performs type checking on retrieved values and logs any errors encountered during initialization.

Returns

None

Raises

Exception
If there is an error loading data for any element. These exceptions are caught and logged.

Notes

  • Requires RDKit's Chem module
  • Uses global PERIODIC_TABLE dictionary to store element data
  • N_PT_ELEMENTS constant defines the number of elements to process
def read_block(block_type, block_content)
Expand source code
def read_block(block_type, block_content):
    """Read a molecular structure block and return its contents based on file format.

    This function acts as a dispatcher to specific readers based on the input file format.

    Args:
        block_type (FileFormat): The format of the molecular structure block (MOL2, PDB, PDBQT, or XYZ)
        block_content (str): The content of the molecular structure block to be read

    Returns:
        tuple: A tuple containing:
            - name (str): The molecule name/identifier 
            - atom_types (list or numpy.ndarray): List/array of atomic symbols
            - coordinates (numpy.ndarray): 3xN array of atom coordinates where N is number of atoms

    Raises:
        Exception: If block_type is not one of the supported formats

    See Also:
        read_mol2_block
        read_pdb_pdbqt_block  
        read_xyz_block

    Examples:
        >>> content = "@<TRIPOS>MOLECULE\\n..."
        >>> name, atoms, coords = read_block(FileFormat.MOL2, content)
    """
    if block_type == FileFormat.MOL2:
        return read_mol2_block(block_content)
    elif block_type in [FileFormat.PDB, FileFormat.PDBQT]:
        return read_pdb_pdbqt_block(block_type, block_content)
    elif block_type == FileFormat.XYZ:
        return read_xyz_block(block_content)
    else:
        raise Exception(f"Invalid file format {block_type}")

Read a molecular structure block and return its contents based on file format.

This function acts as a dispatcher to specific readers based on the input file format.

Args

block_type : FileFormat
The format of the molecular structure block (MOL2, PDB, PDBQT, or XYZ)
block_content : str
The content of the molecular structure block to be read

Returns

tuple
A tuple containing: - name (str): The molecule name/identifier - atom_types (list or numpy.ndarray): List/array of atomic symbols - coordinates (numpy.ndarray): 3xN array of atom coordinates where N is number of atoms

Raises

Exception
If block_type is not one of the supported formats

See Also: read_mol2_block read_pdb_pdbqt_block
read_xyz_block

Examples

>>> content = "@<TRIPOS>MOLECULE\n..."
>>> name, atoms, coords = read_block(FileFormat.MOL2, content)
def read_mol2_block(block: str)
Expand source code
def read_mol2_block(block: str):
    """Read a MOL2 block and extract molecule information.
    This function takes a MOL2 format block string and parses it to extract the molecule name,
    atom types, and 3D coordinates using RDKit.
    
    Args:
        block (str): A string containing molecule data in MOL2 format.
    
    Returns:
        tuple: A tuple containing:
            - name (str): The name/identifier of the molecule
            - atom_types (numpy.ndarray): Array of atomic symbols for each atom
            - coordinates (numpy.ndarray): Array of shape (n_atoms, 3) containing 3D coordinates
              of atoms in the molecule
    
    Note:
        The function preserves hydrogen atoms and skips sanitization when reading the MOL2 block.
    """
    mol = Chem.rdmolfiles.MolFromMol2Block(block, removeHs = False, sanitize = False)
    name = mol.GetProp('_Name')
    
    atoms = mol.GetAtoms()
    atom_types = np.empty(len(atoms), dtype=str)
    coordinates = np.empty((3, len(atoms)), dtype=np.float32)

    for i, atom in enumerate(atoms):
        atom_types[i] =  atom.GetSymbol()
        positions = mol.GetConformer(0).GetAtomPosition(i)
        coordinates[:, i] = np.array([positions.x, positions.y, positions.z], dtype=np.float32)

    return name, atom_types, np.transpose(coordinates)

Read a MOL2 block and extract molecule information. This function takes a MOL2 format block string and parses it to extract the molecule name, atom types, and 3D coordinates using RDKit.

Args

block : str
A string containing molecule data in MOL2 format.

Returns

tuple
A tuple containing: - name (str): The name/identifier of the molecule - atom_types (numpy.ndarray): Array of atomic symbols for each atom - coordinates (numpy.ndarray): Array of shape (n_atoms, 3) containing 3D coordinates of atoms in the molecule

Note

The function preserves hydrogen atoms and skips sanitization when reading the MOL2 block.

def read_pdb_pdbqt_block(block_type, block)
Expand source code
def read_pdb_pdbqt_block(block_type, block):
    """
    Read and parse PDB/PDBQT block to extract molecule information.

    Args:
        block_type (FileFormat): Format of the input block (PDB or PDBQT)
        block (str): Text block containing molecule data in PDB or PDBQT format 

    Returns:
        tuple: A tuple containing:
            - name (str or None): Molecule name from COMPND field (PDB) or REMARK Name field (PDBQT)
            - atom_types (list): List of atom types as chemical element symbols
            - coordinates (np.ndarray): 3xN array of atomic coordinates (x,y,z) where N is number of atoms

    Notes:
        For PDBQT files, atom types are converted from PDBQT format to standard chemical element symbols.
        If atom type cannot be found in columns 70-73, it attempts to read from columns 76-79.
    """
    lines = block.split("\n")
    uppercase_atom_symbols = [symbol.upper() for symbol in PERIODIC_TABLE['symbol']]

    if block_type == FileFormat.PDB:
        compnd_line_id = next((i for i, line in enumerate(lines) if line.startswith("COMPND")), None)
        name = lines[compnd_line_id][7:].strip() if compnd_line_id is not None else None
    else:
        name_line_id = next((i for i, line in enumerate(lines) if line.startswith("REMARK  Name")), None)
        name = lines[name_line_id][15:].strip() if name_line_id is not None else None

    atom_line_ids = [i for i, line in enumerate(lines) if line.startswith("ATOM")]

    atom_count = len(atom_line_ids)

    atom_types = [None] * atom_count
    coordinates = np.empty((3, atom_count), dtype=np.float32)

    atom_lines = [lines[i] for i in atom_line_ids]
    for i, line in enumerate(atom_lines):
        coordinates[0, i] = float(line[30:38])
        coordinates[1, i] = float(line[38:46])
        coordinates[2, i] = float(line[46:54])

        atom_type = line[70:73].strip()
        if atom_type not in uppercase_atom_symbols:
            atom_type = from_pdbqt_atom_type(atom_type)

        if atom_type == '':
            atom_type = line[76:79].strip()
            if atom_type not in uppercase_atom_symbols:
                atom_type = from_pdbqt_atom_type(atom_type)

        atom_types[i] = atom_type

    return name, atom_types, coordinates

Read and parse PDB/PDBQT block to extract molecule information.

Args

block_type : FileFormat
Format of the input block (PDB or PDBQT)
block : str
Text block containing molecule data in PDB or PDBQT format

Returns

tuple
A tuple containing: - name (str or None): Molecule name from COMPND field (PDB) or REMARK Name field (PDBQT) - atom_types (list): List of atom types as chemical element symbols - coordinates (np.ndarray): 3xN array of atomic coordinates (x,y,z) where N is number of atoms

Notes

For PDBQT files, atom types are converted from PDBQT format to standard chemical element symbols. If atom type cannot be found in columns 70-73, it attempts to read from columns 76-79.

def read_xyz_block(block: str)
Expand source code
def read_xyz_block(block: str):
    """Reads and parses an XYZ format block of molecular coordinates.

    Args:
        block (str): A string containing the XYZ format block to be parsed. The block should follow
            the standard XYZ file format:
            - First line: number of atoms
            - Second line: molecule name/comment
            - Subsequent lines: atom_type x y z coordinates

    Returns:
        tuple: A tuple containing:
            - name (str): The molecule name or comment from the second line
            - atom_types (numpy.ndarray): 1D array containing the atomic symbols for each atom
            - coordinates (numpy.ndarray): 3xN array of atomic coordinates where N is the number of atoms.
              First dimension corresponds to x, y, z coordinates.

    Examples:
        >>> block = "2\\nH2\\nH 0.0 0.0 0.0\\nH 0.0 0.0 0.74"
        >>> name, atoms, coords = read_xyz_block(block)
        >>> print(name)
        'H2'
    """
    lines = block.split("\n")

    atom_count = int(lines[0])
    name = lines[1].strip()

    atom_types = np.empty(atom_count, dtype=str)
    coordinates = np.empty((3, atom_count), dtype=np.float32)

    atom_lines = lines[2:2+atom_count]
    for i, line in enumerate(atom_lines):
        tokens = line.split()
        atom_types[i] = tokens[0]
        coordinates[0, i] = float(tokens[1])
        coordinates[1, i] = float(tokens[2])
        coordinates[2, i] = float(tokens[3])

    return name, atom_types, coordinates

Reads and parses an XYZ format block of molecular coordinates.

Args

block : str
A string containing the XYZ format block to be parsed. The block should follow the standard XYZ file format: - First line: number of atoms - Second line: molecule name/comment - Subsequent lines: atom_type x y z coordinates

Returns

tuple
A tuple containing: - name (str): The molecule name or comment from the second line - atom_types (numpy.ndarray): 1D array containing the atomic symbols for each atom - coordinates (numpy.ndarray): 3xN array of atomic coordinates where N is the number of atoms. First dimension corresponds to x, y, z coordinates.

Examples

>>> block = "2\nH2\nH 0.0 0.0 0.0\nH 0.0 0.0 0.74"
>>> name, atoms, coords = read_xyz_block(block)
>>> print(name)
'H2'