Using other input file formats

gorder is primarily designed for analyzing Gromacs simulations. The most efficient and straightforward way to use it is therefore by providing it with a TPR file and an XTC file. However, to make it more flexible and less dependent on a specific MD engine, gorder also supports various other file formats for both structure/topology and trajectory files.

Structure and topology file formats

In rare cases where gorder cannot read your input TPR file, or if no TPR file is available, you can provide the system structure and topology using alternative file formats. gorder supports GRO, PDB, and PQR files as structure files.

For example, the following YAML input file is valid if the provided PDB file includes a connectivity section (and has fewer than 100,000 atoms due to PDB format limitations):

structure: system.pdb
trajectory: md.xtc
analysis_type: !CGOrder
  beads: "@membrane"
output: order.yaml

If the PDB file lacks a connectivity section or if you are using a GRO or PQR file as the input structure, you must also supply a "bonds" file:

structure: system.gro
bonds: system.bnd   # the file extension does not matter here
trajectory: md.xtc
analysis_type: !CGOrder
  beads: "@membrane"
output: order.yaml

Connectivity information will be read from system.bnd. Note that the connectivity data in the bonds file overrides even information in the provided PDB or TPR file.

gorder identifies the file format based on the file extension: .tpr for TPR files, .gro for GRO files, .pdb for PDB files, and .pqr for PQR files. Ensure that the file is named correspondingly.

Specification of the bonds file

The bonds file format is similar to the PDB connectivity block but simpler and much more flexible. It also supports systems of any (reasonable) size.

Each line in the file contains numbers separated by whitespace. The first number specifies the "target atom" (serial number), and the following numbers specify the atoms bonded to it. Serial numbers correspond to Gromacs numbering, where the first atom is 1, the second is 2, and so on (no matter the "atom number" in the input GRO or PDB file).

1 2 4

In this example, atom 1 is bonded to atoms 2 and 4. Atoms 2 and 4 are not bonded to each other.

Each bond needs to be specified only once (with any order of atoms), but duplicating entries is allowed. For instance, the following bonds files are equivalent:

1 2 4
2 3
1 2 4
2 1 3
3 2
4 1

(The system contains three bonds: atoms 1 and 2, atoms 1 and 4, and atoms 2 and 3.)

Bonds can be listed in any order, and multiple lines can start with the same target atom. Any number of bonds can be specified on one line. Empty lines are ignored, and any amount of whitespace is allowed between the numbers. Comments can be added using #:

1 2 4   # atom 1 is bonded to atoms 2 and 4
2 3

Here’s an excerpt from an example bonds file:

# Example bonds file for an atomistic system
1 2 3 4 5
2 1
3 1
4 1
5 1 6 7 8
6 5
7 5
8 5 9 10 15
9 8
10 8
11 12 13 14 15
12 11
13 11
14 11 16
15 8 11
16 14 17 18 19
# (...)

Note about selecting elements

When using the element keyword in GSL atom selection queries, atoms are selected based on their associated element. Element information is natively available in TPR files but is missing in other supported formats. If a non-TPR file is used, gorder will attempt to guess the elements of atoms based on the atom and residue names.

If gorder detects potential issues with the guess, it will display a warning in the terminal. You can then evaluate whether the concerns are harmless, avoid using the element keyword, or provide a TPR file instead.

Trajectory file formats

gorder is highly optimized to read XTC files as quickly as possible. If you have an XTC file or can generate one, it is recommended to use it. However, gorder also supports various other trajectory file formats, namely TRR, GRO, PDB, Amber NetCDF, DCD, and LAMMPSTRJ. You can use any of these files instead of an XTC trajectory, and gorder will automatically recognize the format based on the file extension.

gorder identifies the file format based on the file extension: .xtc for XTC files, .trr for TRR files, .gro for GRO files, .pdb for PDB files, .nc for Amber NetCDF files, .dcd for DCD files, and .lammpstrj for LAMMPSTRJ files. Ensure that your trajectory file is named accordingly.

Note that some features of gorder are not available for certain trajectory file formats. Specifically, do not specify the analysis time range using begin and end if your trajectory file is a PDB file or an Amber NetCDF file. This will not work and will result in an error!

You can only use this feature with a GRO trajectory if the file contains simulation time and step information in the 'title' line, formatted as follows:

Some Arbitrarily Long Title (...) t= SIMULATION_TIME step= SIMULATION_STEP

For example:

System t= 100.00000 step= 5000

Additionally, note that gorder assumes that time information in DCD files is specified in ps.