Module `deeporigin.src.utilities.readers`

Functions

def from_pdbqt_atom_type(atom_type: str)

Expand source code

def from_pdbqt_atom_type(atom_type: str):
    """Convert PDBQT atom types to standard atomic symbols.

    This function converts AutoDock PDBQT atom types to their corresponding
    standard atomic symbols used in molecular structures.

    Args:
        atom_type (str): The PDBQT atom type to convert.
            Valid types include:
            - A, Z, G, GA, J, Q (converts to C)
            - HD, HS (converts to H)
            - NA, NS (converts to N)
            - OA, OS (converts to O)
            - SA (converts to S)

    Returns:
        str: The corresponding standard atomic symbol (C, H, N, O, S),
             or empty string if atom type is not recognized.

    Examples:
        >>> from_pdbqt_atom_type("HD")
        'H'
        >>> from_pdbqt_atom_type("OA") 
        'O'
        >>> from_pdbqt_atom_type("XX")
        ''
    """
    if atom_type in ["A", "Z", "G", "GA", "J", "Q"]:
        return "C"
    elif atom_type in ["HD", "HS"]:
        return "H"
    elif atom_type in ["NA", "NS"]:
        return "N"
    elif atom_type in ["OA", "OS"]:
        return "O"
    elif atom_type == "SA":
        return "S"
    else:
        return ""

Convert PDBQT atom types to standard atomic symbols.

This function converts AutoDock PDBQT atom types to their corresponding standard atomic symbols used in molecular structures.

Args

atom_type : str: The PDBQT atom type to convert. Valid types include: - A, Z, G, GA, J, Q (converts to C) - HD, HS (converts to H) - NA, NS (converts to N) - OA, OS (converts to O) - SA (converts to S)

Returns

str: The corresponding standard atomic symbol (C, H, N, O, S), or empty string if atom type is not recognized.

Examples

>>> from_pdbqt_atom_type("HD")
'H'
>>> from_pdbqt_atom_type("OA") 
'O'
>>> from_pdbqt_atom_type("XX")
''

def init_periodic_table()

Expand source code

def init_periodic_table():
    """Initialize periodic table data.

    This function populates the PERIODIC_TABLE dictionary with atomic data for all elements up to N_PT_ELEMENTS. 
    The data is retrieved from RDKit's Periodic Table and includes:
    - Atomic ID
    - Element symbol
    - Element name  
    - Atomic weight
    - Van der Waals radius
    - Covalent radius

    The function performs type checking on retrieved values and logs any errors encountered during initialization.

    Returns:
        None

    Raises:
        Exception: If there is an error loading data for any element. These exceptions are caught and logged.

    Notes:
        - Requires RDKit's Chem module
        - Uses global PERIODIC_TABLE dictionary to store element data
        - N_PT_ELEMENTS constant defines the number of elements to process
    """
    logger = Logger("INFO", os.getenv("LOG_BIOSIM_CLIENT"))
    pt = Chem.GetPeriodicTable()

    for id in range(1, N_PT_ELEMENTS + 1):
        try:
            symbol = pt.GetElementSymbol(id)
            name = pt.GetElementName(id)
            weight = pt.GetAtomicWeight(id)
            rvdw = pt.GetRvdw(id)
            rcov = pt.GetRcovalent(id)

            assert isinstance(symbol, str), "Symbol must be a string"
            assert isinstance(name, str), "Name must be a string"
            assert isinstance(weight, float), "Weight must be a float"
            assert isinstance(rvdw, float), "rvdw must be a float"
            assert isinstance(rcov, float), "rcov must be a float"

            PERIODIC_TABLE['id'].append(id)
            PERIODIC_TABLE['symbol'].append(symbol)
            PERIODIC_TABLE['name'].append(name)
            PERIODIC_TABLE['weight'].append(weight)
            PERIODIC_TABLE['rvdw'].append(rvdw)
            PERIODIC_TABLE['rcov'].append(rcov)
        except Exception as e:
            logger.log_error(f"Failed to load Periodic Table on atom with id: {id}. Error: {str(e)}")

Initialize periodic table data.

This function populates the PERIODIC_TABLE dictionary with atomic data for all elements up to N_PT_ELEMENTS. The data is retrieved from RDKit's Periodic Table and includes: - Atomic ID - Element symbol - Element name
- Atomic weight - Van der Waals radius - Covalent radius

The function performs type checking on retrieved values and logs any errors encountered during initialization.

Returns

None

Raises

Exception: If there is an error loading data for any element. These exceptions are caught and logged.

Notes

Requires RDKit's Chem module
Uses global PERIODIC_TABLE dictionary to store element data
N_PT_ELEMENTS constant defines the number of elements to process

def read_block(block_type, block_content)

Expand source code

def read_block(block_type, block_content):
    """Read a molecular structure block and return its contents based on file format.

    This function acts as a dispatcher to specific readers based on the input file format.

    Args:
        block_type (FileFormat): The format of the molecular structure block (MOL2, PDB, PDBQT, or XYZ)
        block_content (str): The content of the molecular structure block to be read

    Returns:
        tuple: A tuple containing:
            - name (str): The molecule name/identifier 
            - atom_types (list or numpy.ndarray): List/array of atomic symbols
            - coordinates (numpy.ndarray): 3xN array of atom coordinates where N is number of atoms

    Raises:
        Exception: If block_type is not one of the supported formats

    See Also:
        read_mol2_block
        read_pdb_pdbqt_block  
        read_xyz_block

    Examples:
        >>> content = "@<TRIPOS>MOLECULE\\n..."
        >>> name, atoms, coords = read_block(FileFormat.MOL2, content)
    """
    if block_type == FileFormat.MOL2:
        return read_mol2_block(block_content)
    elif block_type in [FileFormat.PDB, FileFormat.PDBQT]:
        return read_pdb_pdbqt_block(block_type, block_content)
    elif block_type == FileFormat.XYZ:
        return read_xyz_block(block_content)
    else:
        raise Exception(f"Invalid file format {block_type}")

Read a molecular structure block and return its contents based on file format.

This function acts as a dispatcher to specific readers based on the input file format.

Args

block_type : FileFormat: The format of the molecular structure block (MOL2, PDB, PDBQT, or XYZ)
block_content : str: The content of the molecular structure block to be read

Returns

tuple: A tuple containing: - name (str): The molecule name/identifier - atom_types (list or numpy.ndarray): List/array of atomic symbols - coordinates (numpy.ndarray): 3xN array of atom coordinates where N is number of atoms

Raises

Exception: If block_type is not one of the supported formats

See Also: read_mol2_block read_pdb_pdbqt_block
read_xyz_block

Examples

>>> content = "@<TRIPOS>MOLECULE\n..."
>>> name, atoms, coords = read_block(FileFormat.MOL2, content)

def read_mol2_block(block: str)

Expand source code

def read_mol2_block(block: str):
    """Read a MOL2 block and extract molecule information.
    This function takes a MOL2 format block string and parses it to extract the molecule name,
    atom types, and 3D coordinates using RDKit.
    
    Args:
        block (str): A string containing molecule data in MOL2 format.
    
    Returns:
        tuple: A tuple containing:
            - name (str): The name/identifier of the molecule
            - atom_types (numpy.ndarray): Array of atomic symbols for each atom
            - coordinates (numpy.ndarray): Array of shape (n_atoms, 3) containing 3D coordinates
              of atoms in the molecule
    
    Note:
        The function preserves hydrogen atoms and skips sanitization when reading the MOL2 block.
    """
    mol = Chem.rdmolfiles.MolFromMol2Block(block, removeHs = False, sanitize = False)
    name = mol.GetProp('_Name')
    
    atoms = mol.GetAtoms()
    atom_types = np.empty(len(atoms), dtype=str)
    coordinates = np.empty((3, len(atoms)), dtype=np.float32)

    for i, atom in enumerate(atoms):
        atom_types[i] =  atom.GetSymbol()
        positions = mol.GetConformer(0).GetAtomPosition(i)
        coordinates[:, i] = np.array([positions.x, positions.y, positions.z], dtype=np.float32)

    return name, atom_types, np.transpose(coordinates)

Read a MOL2 block and extract molecule information. This function takes a MOL2 format block string and parses it to extract the molecule name, atom types, and 3D coordinates using RDKit.

Args

block : str: A string containing molecule data in MOL2 format.

Returns

tuple: A tuple containing: - name (str): The name/identifier of the molecule - atom_types (numpy.ndarray): Array of atomic symbols for each atom - coordinates (numpy.ndarray): Array of shape (n_atoms, 3) containing 3D coordinates of atoms in the molecule

Note

The function preserves hydrogen atoms and skips sanitization when reading the MOL2 block.

def read_pdb_pdbqt_block(block_type, block)

Expand source code

def read_pdb_pdbqt_block(block_type, block):
    """
    Read and parse PDB/PDBQT block to extract molecule information.

    Args:
        block_type (FileFormat): Format of the input block (PDB or PDBQT)
        block (str): Text block containing molecule data in PDB or PDBQT format 

    Returns:
        tuple: A tuple containing:
            - name (str or None): Molecule name from COMPND field (PDB) or REMARK Name field (PDBQT)
            - atom_types (list): List of atom types as chemical element symbols
            - coordinates (np.ndarray): 3xN array of atomic coordinates (x,y,z) where N is number of atoms

    Notes:
        For PDBQT files, atom types are converted from PDBQT format to standard chemical element symbols.
        If atom type cannot be found in columns 70-73, it attempts to read from columns 76-79.
    """
    lines = block.split("\n")
    uppercase_atom_symbols = [symbol.upper() for symbol in PERIODIC_TABLE['symbol']]

    if block_type == FileFormat.PDB:
        compnd_line_id = next((i for i, line in enumerate(lines) if line.startswith("COMPND")), None)
        name = lines[compnd_line_id][7:].strip() if compnd_line_id is not None else None
    else:
        name_line_id = next((i for i, line in enumerate(lines) if line.startswith("REMARK  Name")), None)
        name = lines[name_line_id][15:].strip() if name_line_id is not None else None

    atom_line_ids = [i for i, line in enumerate(lines) if line.startswith("ATOM")]

    atom_count = len(atom_line_ids)

    atom_types = [None] * atom_count
    coordinates = np.empty((3, atom_count), dtype=np.float32)

    atom_lines = [lines[i] for i in atom_line_ids]
    for i, line in enumerate(atom_lines):
        coordinates[0, i] = float(line[30:38])
        coordinates[1, i] = float(line[38:46])
        coordinates[2, i] = float(line[46:54])

        atom_type = line[70:73].strip()
        if atom_type not in uppercase_atom_symbols:
            atom_type = from_pdbqt_atom_type(atom_type)

        if atom_type == '':
            atom_type = line[76:79].strip()
            if atom_type not in uppercase_atom_symbols:
                atom_type = from_pdbqt_atom_type(atom_type)

        atom_types[i] = atom_type

    return name, atom_types, coordinates

Read and parse PDB/PDBQT block to extract molecule information.

Args

block_type : FileFormat: Format of the input block (PDB or PDBQT)
block : str: Text block containing molecule data in PDB or PDBQT format

Returns

tuple: A tuple containing: - name (str or None): Molecule name from COMPND field (PDB) or REMARK Name field (PDBQT) - atom_types (list): List of atom types as chemical element symbols - coordinates (np.ndarray): 3xN array of atomic coordinates (x,y,z) where N is number of atoms

Notes

For PDBQT files, atom types are converted from PDBQT format to standard chemical element symbols. If atom type cannot be found in columns 70-73, it attempts to read from columns 76-79.

def read_xyz_block(block: str)

Expand source code

def read_xyz_block(block: str):
    """Reads and parses an XYZ format block of molecular coordinates.

    Args:
        block (str): A string containing the XYZ format block to be parsed. The block should follow
            the standard XYZ file format:
            - First line: number of atoms
            - Second line: molecule name/comment
            - Subsequent lines: atom_type x y z coordinates

    Returns:
        tuple: A tuple containing:
            - name (str): The molecule name or comment from the second line
            - atom_types (numpy.ndarray): 1D array containing the atomic symbols for each atom
            - coordinates (numpy.ndarray): 3xN array of atomic coordinates where N is the number of atoms.
              First dimension corresponds to x, y, z coordinates.

    Examples:
        >>> block = "2\\nH2\\nH 0.0 0.0 0.0\\nH 0.0 0.0 0.74"
        >>> name, atoms, coords = read_xyz_block(block)
        >>> print(name)
        'H2'
    """
    lines = block.split("\n")

    atom_count = int(lines[0])
    name = lines[1].strip()

    atom_types = np.empty(atom_count, dtype=str)
    coordinates = np.empty((3, atom_count), dtype=np.float32)

    atom_lines = lines[2:2+atom_count]
    for i, line in enumerate(atom_lines):
        tokens = line.split()
        atom_types[i] = tokens[0]
        coordinates[0, i] = float(tokens[1])
        coordinates[1, i] = float(tokens[2])
        coordinates[2, i] = float(tokens[3])

    return name, atom_types, coordinates

Reads and parses an XYZ format block of molecular coordinates.

Args

block : str: A string containing the XYZ format block to be parsed. The block should follow the standard XYZ file format: - First line: number of atoms - Second line: molecule name/comment - Subsequent lines: atom_type x y z coordinates

Returns

tuple: A tuple containing: - name (str): The molecule name or comment from the second line - atom_types (numpy.ndarray): 1D array containing the atomic symbols for each atom - coordinates (numpy.ndarray): 3xN array of atomic coordinates where N is the number of atoms. First dimension corresponds to x, y, z coordinates.

Examples

>>> block = "2\nH2\nH 0.0 0.0 0.0\nH 0.0 0.0 0.74"
>>> name, atoms, coords = read_xyz_block(block)
>>> print(name)
'H2'