Skip to content

proteindatabank

Convert an OPTIMADE structure, in the format of StructureResource to a PDB file or PDBx/mmCIF file (Protein Data Bank).

For more information on the file formats, see this FAQ page from the wwPDB website.

Note

These conversion functions are inspired heavily by the similar conversion functions in the ASE library.

See here (PDB) and here (PDBx/mmCIF) for the original ASE code.

For more information on the ASE library, see their documentation.

These conversion functions both rely on the NumPy library.

Warning

Currently, the PDBx/mmCIF conversion function is not parsing as a complete PDBx/mmCIF file.

NUMPY_NOT_FOUND = 'NumPy not found, cannot convert structure to your desired format' module-attribute

__all__ = ('get_pdb', 'get_pdbx_mmcif') module-attribute

AdapterPackageNotFound

Bases: OptimadeWarning

The package for an adapter cannot be found.

Source code in optimade/adapters/warnings.py
6
7
class AdapterPackageNotFound(OptimadeWarning):
    """The package for an adapter cannot be found."""

OptimadeStructure

Bases: EntryResource

Representing a structure.

Source code in optimade/models/structures.py
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
class StructureResource(EntryResource):
    """Representing a structure."""

    type: Annotated[
        Literal["structures"],
        StrictField(
            description="""The name of the type of an entry.

- **Type**: string.

- **Requirements/Conventions**:
    - **Support**: MUST be supported by all implementations, MUST NOT be `null`.
    - **Query**: MUST be a queryable property with support for all mandatory filter features.
    - **Response**: REQUIRED in the response.
    - MUST be an existing entry type.
    - The entry of type `<type>` and ID `<id>` MUST be returned in response to a request for `/<type>/<id>` under the versioned base URL.

- **Examples**:
    - `"structures"`""",
            pattern="^structures$",
            support=SupportLevel.MUST,
            queryable=SupportLevel.MUST,
        ),
    ] = "structures"

    attributes: StructureResourceAttributes

OptimadeStructureSpecies

Bases: BaseModel

A list describing the species of the sites of this structure.

Species can represent pure chemical elements, virtual-crystal atoms representing a statistical occupation of a given site by multiple chemical elements, and/or a location to which there are attached atoms, i.e., atoms whose precise location are unknown beyond that they are attached to that position (frequently used to indicate hydrogen atoms attached to another element, e.g., a carbon with three attached hydrogens might represent a methyl group, -CH3).

  • Examples:
    • [ {"name": "Ti", "chemical_symbols": ["Ti"], "concentration": [1.0]} ]: any site with this species is occupied by a Ti atom.
    • [ {"name": "Ti", "chemical_symbols": ["Ti", "vacancy"], "concentration": [0.9, 0.1]} ]: any site with this species is occupied by a Ti atom with 90 % probability, and has a vacancy with 10 % probability.
    • [ {"name": "BaCa", "chemical_symbols": ["vacancy", "Ba", "Ca"], "concentration": [0.05, 0.45, 0.5], "mass": [0.0, 137.327, 40.078]} ]: any site with this species is occupied by a Ba atom with 45 % probability, a Ca atom with 50 % probability, and by a vacancy with 5 % probability. The mass of this site is (on average) 88.5 a.m.u.
    • [ {"name": "C12", "chemical_symbols": ["C"], "concentration": [1.0], "mass": [12.0]} ]: any site with this species is occupied by a carbon isotope with mass 12.
    • [ {"name": "C13", "chemical_symbols": ["C"], "concentration": [1.0], "mass": [13.0]} ]: any site with this species is occupied by a carbon isotope with mass 13.
    • [ {"name": "CH3", "chemical_symbols": ["C"], "concentration": [1.0], "attached": ["H"], "nattached": [3]} ]: any site with this species is occupied by a methyl group, -CH3, which is represented without specifying precise positions of the hydrogen atoms.
Source code in optimade/models/structures.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
class Species(BaseModel):
    """A list describing the species of the sites of this structure.

    Species can represent pure chemical elements, virtual-crystal atoms representing a
    statistical occupation of a given site by multiple chemical elements, and/or a
    location to which there are attached atoms, i.e., atoms whose precise location are
    unknown beyond that they are attached to that position (frequently used to indicate
    hydrogen atoms attached to another element, e.g., a carbon with three attached
    hydrogens might represent a methyl group, -CH3).

    - **Examples**:
        - `[ {"name": "Ti", "chemical_symbols": ["Ti"], "concentration": [1.0]} ]`: any site with this species is occupied by a Ti atom.
        - `[ {"name": "Ti", "chemical_symbols": ["Ti", "vacancy"], "concentration": [0.9, 0.1]} ]`: any site with this species is occupied by a Ti atom with 90 % probability, and has a vacancy with 10 % probability.
        - `[ {"name": "BaCa", "chemical_symbols": ["vacancy", "Ba", "Ca"], "concentration": [0.05, 0.45, 0.5], "mass": [0.0, 137.327, 40.078]} ]`: any site with this species is occupied by a Ba atom with 45 % probability, a Ca atom with 50 % probability, and by a vacancy with 5 % probability. The mass of this site is (on average) 88.5 a.m.u.
        - `[ {"name": "C12", "chemical_symbols": ["C"], "concentration": [1.0], "mass": [12.0]} ]`: any site with this species is occupied by a carbon isotope with mass 12.
        - `[ {"name": "C13", "chemical_symbols": ["C"], "concentration": [1.0], "mass": [13.0]} ]`: any site with this species is occupied by a carbon isotope with mass 13.
        - `[ {"name": "CH3", "chemical_symbols": ["C"], "concentration": [1.0], "attached": ["H"], "nattached": [3]} ]`: any site with this species is occupied by a methyl group, -CH3, which is represented without specifying precise positions of the hydrogen atoms.

    """

    name: Annotated[
        str,
        OptimadeField(
            description="""Gives the name of the species; the **name** value MUST be unique in the `species` list.""",
            support=SupportLevel.MUST,
            queryable=SupportLevel.OPTIONAL,
        ),
    ]

    chemical_symbols: Annotated[
        list[ChemicalSymbol],
        OptimadeField(
            description="""MUST be a list of strings of all chemical elements composing this species. Each item of the list MUST be one of the following:

- a valid chemical-element symbol, or
- the special value `"X"` to represent a non-chemical element, or
- the special value `"vacancy"` to represent that this site has a non-zero probability of having a vacancy (the respective probability is indicated in the `concentration` list, see below).

If any one entry in the `species` list has a `chemical_symbols` list that is longer than 1 element, the correct flag MUST be set in the list `structure_features`.""",
            support=SupportLevel.MUST,
            queryable=SupportLevel.OPTIONAL,
        ),
    ]

    concentration: Annotated[
        list[float],
        OptimadeField(
            description="""MUST be a list of floats, with same length as `chemical_symbols`. The numbers represent the relative concentration of the corresponding chemical symbol in this species. The numbers SHOULD sum to one. Cases in which the numbers do not sum to one typically fall only in the following two categories:

- Numerical errors when representing float numbers in fixed precision, e.g. for two chemical symbols with concentrations `1/3` and `2/3`, the concentration might look something like `[0.33333333333, 0.66666666666]`. If the client is aware that the sum is not one because of numerical precision, it can renormalize the values so that the sum is exactly one.
- Experimental errors in the data present in the database. In this case, it is the responsibility of the client to decide how to process the data.

Note that concentrations are uncorrelated between different site (even of the same species).""",
            support=SupportLevel.MUST,
            queryable=SupportLevel.OPTIONAL,
        ),
    ]

    mass: Annotated[
        Optional[list[float]],
        OptimadeField(
            description="""If present MUST be a list of floats expressed in a.m.u.
Elements denoting vacancies MUST have masses equal to 0.""",
            unit="a.m.u.",
            support=SupportLevel.OPTIONAL,
            queryable=SupportLevel.OPTIONAL,
        ),
    ] = None

    original_name: Annotated[
        Optional[str],
        OptimadeField(
            description="""Can be any valid Unicode string, and SHOULD contain (if specified) the name of the species that is used internally in the source database.

Note: With regards to "source database", we refer to the immediate source being queried via the OPTIMADE API implementation.""",
            support=SupportLevel.OPTIONAL,
            queryable=SupportLevel.OPTIONAL,
        ),
    ] = None

    attached: Annotated[
        Optional[list[str]],
        OptimadeField(
            description="""If provided MUST be a list of length 1 or more of strings of chemical symbols for the elements attached to this site, or "X" for a non-chemical element.""",
            support=SupportLevel.OPTIONAL,
            queryable=SupportLevel.OPTIONAL,
        ),
    ] = None

    nattached: Annotated[
        Optional[list[int]],
        OptimadeField(
            description="""If provided MUST be a list of length 1 or more of integers indicating the number of attached atoms of the kind specified in the value of the :field:`attached` key.""",
            support=SupportLevel.OPTIONAL,
            queryable=SupportLevel.OPTIONAL,
        ),
    ] = None

    @field_validator("concentration", "mass", mode="after")
    def validate_concentration_and_mass(
        cls, value: Optional[list[float]], info: "ValidationInfo"
    ) -> Optional[list[float]]:
        if not value:
            return value

        if info.data.get("chemical_symbols"):
            if len(value) != len(info.data["chemical_symbols"]):
                raise ValueError(
                    f"Length of concentration ({len(value)}) MUST equal length of "
                    f"chemical_symbols ({len(info.data['chemical_symbols'])})"
                )
            return value

        raise ValueError(
            f"Could not validate {info.field_name!r} as 'chemical_symbols' is missing/invalid."
        )

    @field_validator("attached", "nattached", mode="after")
    @classmethod
    def validate_minimum_list_length(
        cls, value: Optional[Union[list[str], list[int]]]
    ) -> Optional[Union[list[str], list[int]]]:
        if value is not None and len(value) < 1:
            raise ValueError(
                "The list's length MUST be 1 or more, instead it was found to be "
                f"{len(value)}"
            )
        return value

    @model_validator(mode="after")
    def attached_nattached_mutually_exclusive(self) -> "Species":
        if (self.attached is None and self.nattached is not None) or (
            self.attached is not None and self.nattached is None
        ):
            raise ValueError(
                f"Either both or none of attached ({self.attached}) and nattached "
                f"({self.nattached}) MUST be set."
            )

        if (
            self.attached is not None
            and self.nattached is not None
            and len(self.attached) != len(self.nattached)
        ):
            raise ValueError(
                f"attached ({self.attached}) and nattached ({self.nattached}) MUST be "
                "lists of equal length."
            )

        return self

cell_to_cellpar(cell, radians=False)

Returns the cell parameters [a, b, c, alpha, beta, gamma].

Angles are in degrees unless radian=True is used.

Note

Based on ASE code.

Parameters:

Name Type Description Default
cell tuple[Vector3D, Vector3D, Vector3D]

A Cartesian 3x3 cell. This equates to the lattice_vectors attribute.

required
radians bool

Use radians instead of degrees (default) for angles.

False

Returns:

Type Description
list[float]

The unit cell parameters as a list of float values.

Source code in optimade/adapters/structures/utils.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def cell_to_cellpar(
    cell: tuple[Vector3D, Vector3D, Vector3D], radians: bool = False
) -> list[float]:
    """Returns the cell parameters `[a, b, c, alpha, beta, gamma]`.

    Angles are in degrees unless `radian=True` is used.

    Note:
        Based on [ASE code](https://wiki.fysik.dtu.dk/ase/_modules/ase/geometry/cell.html#cell_to_cellpar).

    Parameters:
        cell: A Cartesian 3x3 cell. This equates to the
            [`lattice_vectors`][optimade.models.structures.StructureResourceAttributes.lattice_vectors] attribute.
        radians: Use radians instead of degrees (default) for angles.

    Returns:
        The unit cell parameters as a `list` of `float` values.

    """
    if globals().get("np", None) is None:
        warn(NUMPY_NOT_FOUND, AdapterPackageNotFound)
        return None  # type: ignore[return-value]

    cell = np.asarray(cell)

    lengths = [np.linalg.norm(vector) for vector in cell]
    angles = []
    for i in range(3):
        j = i - 1
        k = i - 2
        outer_product = lengths[j] * lengths[k]
        if outer_product > 1e-16:
            x_vector = np.dot(cell[j], cell[k]) / outer_product
            angle = 180.0 / np.pi * np.arccos(x_vector)
        else:
            angle = 90.0
        angles.append(angle)
    if radians:
        angles = [angle * np.pi / 180 for angle in angles]
    return np.array(lengths + angles)

cellpar_to_cell(cellpar, ab_normal=(0, 0, 1), a_direction=None)

Return a 3x3 cell matrix from cellpar=[a,b,c,alpha,beta,gamma].

Angles must be in degrees.

The returned cell is orientated such that a and b are normal to ab_normal and a is parallel to the projection of a_direction in the a-b plane.

Default a_direction is (1,0,0), unless this is parallel to ab_normal, in which case default a_direction is (0,0,1).

The returned cell has the vectors va, vb and vc along the rows. The cell will be oriented such that va and vb are normal to ab_normal and va will be along the projection of a_direction onto the a-b plane.

Example

cell = cellpar_to_cell([1, 2, 4, 10, 20, 30], (0, 1, 1), (1, 2, 3)) np.round(cell, 3) array([[ 0.816, -0.408, 0.408], [ 1.992, -0.13 , 0.13 ], [ 3.859, -0.745, 0.745]])

Note

Direct copy of ASE code.

Parameters:

Name Type Description Default
cellpar list[float]

The unit cell parameters as a list of float values.

Note: The angles must be given in degrees.

required
ab_normal tuple[int, int, int]

Unit vector normal to the ab-plane.

(0, 0, 1)
a_direction Optional[tuple[int, int, int]]

Unit vector defining the a-direction (default: (1, 0, 0)).

None

Returns:

Type Description
list[Vector3D]

A Cartesian 3x3 cell.

list[Vector3D]

This should equate to the

list[Vector3D]

lattice_vectors attribute.

Source code in optimade/adapters/structures/utils.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
def cellpar_to_cell(
    cellpar: list[float],
    ab_normal: tuple[int, int, int] = (0, 0, 1),
    a_direction: Optional[tuple[int, int, int]] = None,
) -> list[Vector3D]:
    """Return a 3x3 cell matrix from `cellpar=[a,b,c,alpha,beta,gamma]`.

    Angles must be in degrees.

    The returned cell is orientated such that a and b
    are normal to `ab_normal` and a is parallel to the projection of
    `a_direction` in the a-b plane.

    Default `a_direction` is (1,0,0), unless this is parallel to
    `ab_normal`, in which case default `a_direction` is (0,0,1).

    The returned cell has the vectors va, vb and vc along the rows. The
    cell will be oriented such that va and vb are normal to `ab_normal`
    and va will be along the projection of `a_direction` onto the a-b
    plane.

    Example:
        >>> cell = cellpar_to_cell([1, 2, 4, 10, 20, 30], (0, 1, 1), (1, 2, 3))
        >>> np.round(cell, 3)
        array([[ 0.816, -0.408,  0.408],
            [ 1.992, -0.13 ,  0.13 ],
            [ 3.859, -0.745,  0.745]])

    Note:
        Direct copy of [ASE code](https://wiki.fysik.dtu.dk/ase/_modules/ase/geometry/cell.html#cellpar_to_cell).

    Parameters:
        cellpar: The unit cell parameters as a `list` of `float` values.

            **Note**: The angles must be given in degrees.
        ab_normal: Unit vector normal to the ab-plane.
        a_direction: Unit vector defining the a-direction (default: `(1, 0, 0)`).

    Returns:
        A Cartesian 3x3 cell.

        This should equate to the
        [`lattice_vectors`][optimade.models.structures.StructureResourceAttributes.lattice_vectors] attribute.

    """
    if globals().get("np", None) is None:
        warn(NUMPY_NOT_FOUND, AdapterPackageNotFound)
        return None  # type: ignore[return-value]

    if a_direction is None:
        if np.linalg.norm(np.cross(ab_normal, (1, 0, 0))) < 1e-5:
            a_direction = (0, 0, 1)
        else:
            a_direction = (1, 0, 0)

    # Define rotated X,Y,Z-system, with Z along ab_normal and X along
    # the projection of a_direction onto the normal plane of Z.
    a_direction_array = np.array(a_direction)
    Z = unit_vector(ab_normal)  # type: ignore
    X = unit_vector(a_direction_array - np.dot(a_direction_array, Z) * Z)
    Y = np.cross(Z, X)

    # Express va, vb and vc in the X,Y,Z-system
    alpha, beta, gamma = 90.0, 90.0, 90.0
    if isinstance(cellpar, (int, float)):
        a = b = c = cellpar
    elif len(cellpar) == 1:
        a = b = c = cellpar[0]
    elif len(cellpar) == 3:
        a, b, c = cellpar
    else:
        a, b, c, alpha, beta, gamma = cellpar

    # Handle orthorhombic cells separately to avoid rounding errors
    eps = 2 * np.spacing(90.0, dtype=np.float64)  # around 1.4e-14
    # alpha
    if abs(abs(alpha) - 90) < eps:
        cos_alpha = 0.0
    else:
        cos_alpha = np.cos(alpha * np.pi / 180.0)
    # beta
    if abs(abs(beta) - 90) < eps:
        cos_beta = 0.0
    else:
        cos_beta = np.cos(beta * np.pi / 180.0)
    # gamma
    if abs(gamma - 90) < eps:
        cos_gamma = 0.0
        sin_gamma = 1.0
    elif abs(gamma + 90) < eps:
        cos_gamma = 0.0
        sin_gamma = -1.0
    else:
        cos_gamma = np.cos(gamma * np.pi / 180.0)
        sin_gamma = np.sin(gamma * np.pi / 180.0)

    # Build the cell vectors
    va = a * np.array([1, 0, 0])
    vb = b * np.array([cos_gamma, sin_gamma, 0])
    cx = cos_beta
    cy = (cos_alpha - cos_beta * cos_gamma) / sin_gamma
    cz_sqr = 1.0 - cx * cx - cy * cy
    assert cz_sqr >= 0
    cz = np.sqrt(cz_sqr)
    vc = c * np.array([cx, cy, cz])

    # Convert to the Cartesian x,y,z-system
    abc = np.vstack((va, vb, vc))
    T = np.vstack((X, Y, Z))
    cell = np.dot(abc, T)

    return cell

fractional_coordinates(cell, cartesian_positions)

Returns fractional coordinates and wraps coordinates to [0,1[.

Note

Based on ASE code.

Parameters:

Name Type Description Default
cell tuple[Vector3D, Vector3D, Vector3D]

A Cartesian 3x3 cell. This equates to the lattice_vectors attribute.

required
cartesian_positions list[Vector3D]

A list of cartesian atomic positions. This equates to the cartesian_site_positions attribute.

required

Returns:

Type Description
list[Vector3D]

A list of fractional coordinates for the atomic positions.

Source code in optimade/adapters/structures/utils.py
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def fractional_coordinates(
    cell: tuple[Vector3D, Vector3D, Vector3D], cartesian_positions: list[Vector3D]
) -> list[Vector3D]:
    """Returns fractional coordinates and wraps coordinates to `[0,1[`.

    Note:
        Based on [ASE code](https://wiki.fysik.dtu.dk/ase/_modules/ase/atoms.html#Atoms.get_scaled_positions).

    Parameters:
        cell: A Cartesian 3x3 cell. This equates to the
            [`lattice_vectors`][optimade.models.structures.StructureResourceAttributes.lattice_vectors] attribute.
        cartesian_positions: A list of cartesian atomic positions. This equates to the
            [`cartesian_site_positions`][optimade.models.structures.StructureResourceAttributes.cartesian_site_positions]
            attribute.

    Returns:
        A list of fractional coordinates for the atomic positions.

    """
    if globals().get("np", None) is None:
        warn(NUMPY_NOT_FOUND, AdapterPackageNotFound)
        return None  # type: ignore[return-value]

    cell_array = np.asarray(cell)
    cartesian_positions_array = np.asarray(cartesian_positions)

    fractional = np.linalg.solve(cell_array.T, cartesian_positions_array.T).T

    # Expecting a bulk 3D structure here, note, this may change in the future.
    # See `ase.atoms:Atoms.get_scaled_positions()` for ideas on how to handle lower dimensional structures.
    # Furthermore, according to ASE we need to modulo 1.0 twice.
    # This seems to be due to small floats % 1.0 becomes 1.0, hence twice makes it 0.0.
    for i in range(3):
        fractional[:, i] %= 1.0
        fractional[:, i] %= 1.0

    return [tuple(position) for position in fractional]  # type: ignore

get_pdb(optimade_structure)

Write Protein Data Bank (PDB) structure in the old PDB format from OPTIMADE structure.

Parameters:

Name Type Description Default
optimade_structure StructureResource

OPTIMADE structure.

required

Returns:

Type Description
str

A PDB file as a single Python str object.

Source code in optimade/adapters/structures/proteindatabank.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
def get_pdb(  # pylint: disable=too-many-locals
    optimade_structure: OptimadeStructure,
) -> str:
    """Write Protein Data Bank (PDB) structure in the old PDB format from OPTIMADE structure.

    Parameters:
        optimade_structure: OPTIMADE structure.

    Returns:
        A PDB file as a single Python `str` object.

    """
    if globals().get("np", None) is None:
        warn(NUMPY_NOT_FOUND, AdapterPackageNotFound)
        return None  # type: ignore[return-value]

    pdb = ""

    attributes = optimade_structure.attributes

    rotation = None
    if valid_lattice_vector(attributes.lattice_vectors):  # type: ignore[arg-type]
        currentcell = np.asarray(attributes.lattice_vectors)
        cellpar = cell_to_cellpar(currentcell)
        exportedcell = cellpar_to_cell(cellpar)
        rotation = np.linalg.solve(currentcell, exportedcell)
        # Setting Z-value = 1 and using P1 since we have all atoms defined explicitly
        Z = 1
        spacegroup = "P 1"
        pdb += (
            f"CRYST1{cellpar[0]:9.3f}{cellpar[1]:9.3f}{cellpar[2]:8.3f}"
            f"{cellpar[3]:7.2f}{cellpar[4]:7.2f}{cellpar[5]:7.2f} {spacegroup:11s}{Z:4d}\n"
        )

        for i, vector in enumerate(scaled_cell(currentcell)):
            pdb += f"SCALE{i + 1}    {vector[0]:10.6f}{vector[1]:10.6f}{vector[2]:10.6f}     {0:10.5f}\n"

    # There is a limit of 5 digit numbers in this field.
    pdb_maxnum = 100000
    bfactor = 1.0

    pdb += "MODEL     1\n"

    species: dict[str, OptimadeStructureSpecies] = {
        species.name: species
        for species in attributes.species  # type:ignore[union-attr]
    }

    sites = np.asarray(attributes.cartesian_site_positions)
    if rotation is not None:
        sites = sites.dot(rotation)

    for site_number in range(attributes.nsites):  # type: ignore[arg-type]
        species_name = attributes.species_at_sites[site_number]  # type: ignore[index]
        site = sites[site_number]

        current_species = species[species_name]

        for index, symbol in enumerate(current_species.chemical_symbols):
            if symbol == "vacancy":
                continue

            label = species_name
            if len(current_species.chemical_symbols) > 1:
                if (
                    "vacancy" in current_species.chemical_symbols
                    and len(current_species.chemical_symbols) == 2
                ):
                    pass
                else:
                    label = f"{symbol}{index + 1}"

            pdb += (
                f"ATOM  {site_number % pdb_maxnum:5d} {label:4} MOL     1    "
                f"{site[0]:8.3f}{site[1]:8.3f}{site[2]:8.3f}"
                f"{current_species.concentration[index]:6.2f}"
                f"{bfactor:6.2f}          {symbol.upper():2}  \n"
            )
    pdb += "ENDMDL\n"

    return pdb

get_pdbx_mmcif(optimade_structure)

Write Protein Data Bank (PDB) structure in the PDBx/mmCIF format from OPTIMADE structure.

Warning

The result of this function can currently not be parsed as a complete PDBx/mmCIF file.

Parameters:

Name Type Description Default
optimade_structure StructureResource

OPTIMADE structure.

required
Return

A modern PDBx/mmCIF file as a single Python str object.

Source code in optimade/adapters/structures/proteindatabank.py
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
def get_pdbx_mmcif(  # pylint: disable=too-many-locals
    optimade_structure: OptimadeStructure,
) -> str:
    """Write Protein Data Bank (PDB) structure in the PDBx/mmCIF format from OPTIMADE structure.

    Warning:
        The result of this function can currently not be parsed as a complete PDBx/mmCIF file.

    Parameters:
        optimade_structure: OPTIMADE structure.

    Return:
        A modern PDBx/mmCIF file as a single Python `str` object.

    """
    if globals().get("np", None) is None:
        warn(NUMPY_NOT_FOUND, AdapterPackageNotFound)
        return None  # type: ignore[return-value]

    cif = """#
# Created from an OPTIMADE structure.
#
# See https://www.optimade.org and/or
# https://github.com/Materials-Consortia/OPTIMADE for more information.
#
# CIF 2.0 format, specifically mmCIF (PDBx).
# See http://mmcif.wwpdb.org for more information.
#
"""

    entry_id = f"{optimade_structure.type}{optimade_structure.id}"
    cif += f"data_{entry_id}\n_entry.id                         {entry_id}\n#\n"

    attributes = optimade_structure.attributes

    # Do this only if there's three non-zero lattice vectors
    if valid_lattice_vector(attributes.lattice_vectors):  # type: ignore[arg-type]
        a_vector, b_vector, c_vector, alpha, beta, gamma = cell_to_cellpar(
            attributes.lattice_vectors  # type: ignore[arg-type]
        )

        cif += (
            f"_cell.entry_id                    {entry_id}\n"
            f"_cell.length_a                    {a_vector:g}\n"
            f"_cell.length_b                    {b_vector:g}\n"
            f"_cell.length_c                    {c_vector:g}\n"
            f"_cell.angle_alpha                 {alpha:g}\n"
            f"_cell.angle_beta                  {beta:g}\n"
            f"_cell.angle_gamma                 {gamma:g}\n"
            "_cell.Z_PDB                       1\n#\n"
        )
        cif += (
            f"_symmetry.entry_id                {entry_id}\n"
            "_symmetry.space_group_name_H-M    'P 1'\n"
            "_symmetry.Int_Tables_number       1\n#\n"
        )

        # Since some structure viewers are having issues with cartesian coordinates,
        # we calculate the fractional coordinates if this is a 3D structure and we have all the necessary information.
        if not hasattr(attributes, "fractional_site_positions"):
            attributes.fractional_site_positions = fractional_coordinates(
                cell=attributes.lattice_vectors,  # type: ignore[arg-type]
                cartesian_positions=attributes.cartesian_site_positions,  # type: ignore[arg-type]
            )

    # NOTE: The following lines are perhaps needed to create a "valid" PDBx/mmCIF file.
    # However, at the same time, the information here is "default" and will for all structures "at this moment in time"
    # be the same. I.e., no information is gained by adding this now.
    # If it is found that they indeed are needed to create a "valid" PDBx/mmCIF file, they should be included in the output.
    # cif += (
    #     "loop_\n"
    #     "_struct_asym.id\n"
    #     "_struct_asym.entity_id\n"
    #     "A  1\n#\n"  # At this point, not using this feature.
    # )

    # cif += (
    #     "loop_\n"
    #     "_chem_comp.id\n"
    #     "X\n#\n"  # At this point, not using this feature.
    # )

    # cif += (
    #     "loop_\n"
    #     "_entity.id\n"
    #     "1\n#\n"  # At this point, not using this feature.
    # )

    # NOTE: This is otherwise a bit ahead of its time, since this OPTIMADE property is part of an open PR.
    # See https://github.com/Materials-Consortia/OPTIMADE/pull/206
    coord_type = (
        "fract" if hasattr(attributes, "fractional_site_positions") else "Cartn"
    )

    cif += (
        "loop_\n"
        "_atom_site.group_PDB\n"  # Always "ATOM"
        "_atom_site.id\n"  # number (1-counting)
        "_atom_site.type_symbol\n"  # species.chemical_symbols
        "_atom_site.label_atom_id\n"  # species.checmical_symbols symbol + number
        # For these next keys, see the comment above.
        # "_atom_site.label_asym_id\n"  # Will be set to "A" _struct_asym.id above
        # "_atom_site.label_comp_id\n"  # Will be set to "X" _chem_comp.id above
        # "_atom_site.label_entity_id\n"  # Will be set to "1" _entity.id above
        # "_atom_site.label_seq_id\n"
        "_atom_site.occupancy\n"  # species.concentration
        f"_atom_site.{coord_type}_x\n"  # cartesian_site_positions
        f"_atom_site.{coord_type}_y\n"  # cartesian_site_positions
        f"_atom_site.{coord_type}_z\n"  # cartesian_site_positions
        "_atom_site.thermal_displace_type\n"  # Set to 'Biso'
        "_atom_site.B_iso_or_equiv\n"  # Set to 1.0:f
    )

    if coord_type == "fract":
        sites = attributes.fractional_site_positions
    else:
        sites = attributes.cartesian_site_positions

    species: dict[str, OptimadeStructureSpecies] = {
        species.name: species for species in attributes.species  # type: ignore[union-attr]
    }

    for site_number in range(attributes.nsites):  # type: ignore[arg-type]
        species_name = attributes.species_at_sites[site_number]  # type: ignore[index]
        site = sites[site_number]

        current_species = species[species_name]

        for index, symbol in enumerate(current_species.chemical_symbols):
            if symbol == "vacancy":
                continue

            label = f"{species_name.upper()}{site_number + 1}"
            if len(current_species.chemical_symbols) > 1:
                if (
                    "vacancy" in current_species.chemical_symbols
                    and len(current_species.chemical_symbols) == 2
                ):
                    pass
                else:
                    label = f"{symbol.upper()}{index + 1}"

            cif += (
                f"ATOM  {site_number + 1:5d}  {symbol}  {label:8}  "
                f"{current_species.concentration[index]:6.4f}  {site[0]:8.5f}  "
                f"{site[1]:8.5f}  {site[2]:8.5f}  {'Biso':4}  {'1.000':6}\n"
            )

    return cif

scaled_cell(cell)

Return a scaled 3x3 cell from cartesian 3x3 cell (lattice_vectors). This 3x3 matrix can be used to calculate the fractional coordinates from the cartesian_site_positions.

This is based on PDB's method of calculating SCALE from CRYST data. For more info, see this site.

Parameters:

Name Type Description Default
cell tuple[Vector3D, Vector3D, Vector3D]

A Cartesian 3x3 cell. This equates to the lattice_vectors attribute.

required

Returns:

Type Description
tuple[Vector3D, Vector3D, Vector3D]

A scaled 3x3 cell.

Source code in optimade/adapters/structures/utils.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def scaled_cell(
    cell: tuple[Vector3D, Vector3D, Vector3D]
) -> tuple[Vector3D, Vector3D, Vector3D]:
    """Return a scaled 3x3 cell from cartesian 3x3 cell (`lattice_vectors`).
    This 3x3 matrix can be used to calculate the fractional coordinates from the cartesian_site_positions.

    This is based on PDB's method of calculating SCALE from CRYST data.
    For more info, see [this site](https://www.wwpdb.org/documentation/file-format-content/format33/sect8.html#SCALEn).

    Parameters:
        cell: A Cartesian 3x3 cell. This equates to the
            [`lattice_vectors`][optimade.models.structures.StructureResourceAttributes.lattice_vectors] attribute.

    Returns:
        A scaled 3x3 cell.

    """
    if globals().get("np", None) is None:
        warn(NUMPY_NOT_FOUND, AdapterPackageNotFound)
        return None  # type: ignore[return-value]

    cell = np.asarray(cell)

    volume = np.dot(cell[0], np.cross(cell[1], cell[2]))
    scale = []
    for i in range(3):
        vector = np.cross(cell[(i + 1) % 3], cell[(i + 2) % 3]) / volume
        scale.append(tuple(vector))
    return tuple(scale)  # type: ignore[return-value]

valid_lattice_vector(lattice_vec)

Source code in optimade/adapters/structures/utils.py
23
24
25
26
27
28
29
30
31
def valid_lattice_vector(lattice_vec: tuple[Vector3D, Vector3D, Vector3D]):
    if len(lattice_vec) != 3:
        return False
    for vector in lattice_vec:
        if (
            (len(vector) != 3) or (None in vector) or (np.linalg.norm(vector) < 1e-15)
        ):  # Due to rounding errors very small values instead of 0.0 may appear for the lattice vectors. therefore I check here whether the value is not too small. I am however not sure what the smallest value is that I can put here.
            return False
    return True