Introduction

Groan Selection Language (GSL) is a query language for selecting atoms in the groan_rs crate and in programs built on top of it, such as gorder or gcenter.

The selection language is quite similar to the languages used by VMD and MDAnalysis, so if you are familiar with them, GSL should feel quite intuitive. This guide should help you understand and efficiently use GSL both when working directly with the groan_rs library and when using applications built on top of it.


This document describes the syntax of GSL v0.9 (corresponding to groan_rs v0.9 and v0.10).

Basic queries

GSL allows you to select atoms based on their attributes. Here's how you can perform basic selections:

1. Residue names

Select atoms belonging to residues with specific names.

  • Syntax: resname XYZ
  • Example: resname POPE

Selects all atoms in residues named POPE.

2. Residue numbers

Select atoms based on their residue numbers.

  • Syntax: resid XYZ or resnum XYZ
  • Example: resid 17

Selects all atoms in residue number 17.

3. Atom names

Select atoms with specific atom names.

  • Syntax: name XYZ or atomname XYZ
  • Example: name P

Selects all atoms named P.

4. Serial atom numbers

Select atoms based on their serial atom numbers as understood by GROMACS.

  • Syntax: serial XYZ
  • Example: serial 256

Selects the atom with serial atom number 256.

Note: This selection is determined using GROMACS's internal atom numbering, not the numbering found in GRO or PDB files. Here, numbering starts at 1 for the first atom and increases sequentially. Each atom is guaranteed to a have a unique serial number.

5. GRO/PDB atom numbers

Select atoms based on their atom numbers in the originating GRO or PDB file.

  • Syntax: atomnum XYZ
  • Example: atomnum 124

Selects all atoms with atom number 124 in the input file.

Note: There is no guarantee that one specific atom number will be assigned to exactly one atom.

6. Chain identifiers

Select atoms belonging to specific chains.

  • Syntax: chain X

  • Example: chain A

    Selects all atoms in chain 'A'.

Note: Chain information is typically available in PDB files. If absent, this selection will yield no results.

7. Element names

Select atoms based on their element names.

  • Syntax: element name XYZ or elname XYZ
  • Example: element name carbon

Selects all carbon atoms.

Note: Element information must be available. If absent, this selection will yield no results. Read more here.

8. Element symbols

Select atoms based on their element symbols.

  • Syntax: element symbol X or elsymbol X
  • Example: element symbol C

Selects all carbon atoms.

Note: Element information must be available. If absent, this selection will yield no results. Read more here.

9. Labels

Select atoms using predefined labels.

  • Syntax: label XYZ

Labels are discussed in detail later in this tutorial.

Multiple identifiers

GSL allows you to specify multiple identifiers within a single query, enabling the selection of atoms matching any of the provided criteria.

Examples:

  • Residue names:

    resname POPE POPG
    

    Selects all atoms in residues named POPE or POPG.

  • Residue numbers:

    resid 13 15 16 17
    

    Selects atoms in residues numbered 13, 15, 16, or 17.

  • Atom names:

    name P CA HA
    

    Selects atoms named P, CA, or HA.

  • Serial numbers:

    serial 245 267 269 271
    

    Selects atoms with serial numbers 245, 267, 269, or 271.

  • Chains:

    chain A B C
    

    Selects atoms in chains 'A', 'B', or 'C'.

  • Element names:

    elname carbon hydrogen
    

    Selects all carbon and hydrogen atoms.

  • Element symbols:

    elsymbol C H
    

    Selects all carbon and hydrogen atoms.

Selecting atoms using groups

GSL allows you to select atoms using the previously constructed groups of atoms.

Creating groups

Before selecting using groups, you must construct them. This is typically done within your workflow or by loading predefined groups, e.g. from NDX files.

Selecting groups

  • Syntax:

    group GroupName1 GroupName2
    

    Or simply:

    GroupName1 GroupName2
    
  • Example:

    Protein Membrane
    

    Selects all atoms in the groups named "Protein" and "Membrane".

Important: If a specified group does not exist, an error will be raised.

Loading groups from NDX files

If you've loaded an NDX file using System::read_ndx, you can utilize groups defined within that file.

Note: If a program using groan_rs loads an NDX file, you will typically be able to select groups defined in the NDX file.

Groups with multiple words

For groups with names comprising multiple words, enclose the group name in quotes.

  • Example:

    Protein 'My Lipids'
    

    Selects atoms in the "Protein" group and the "My Lipids" group.

Selecting atoms by autodetection

GSL offers macros that automatically detect and select common molecular components. These macros simplify selections without needing to specify detailed criteria.

Available macros:

  • @protein: Selects all amino acid atoms (supports ~140 different amino acids).
  • @water: Selects all water atoms.
  • @ion: Selects all ion atoms.
  • @membrane: Selects membrane lipid atoms (supports over 200 lipid types).
  • @dna: Selects all DNA molecule atoms.
  • @rna: Selects all RNA molecule atoms.

Example usage:

@protein or @membrane

Selects all protein and membrane lipid atoms.

Caution: While these macros are generally reliable, they may not always perfectly identify all relevant atoms. Use them judiciously, especially when working with custom molecules.

Selecting all atoms

Sometimes, you may want to select the entire set of atoms in your system.

Syntax:

all

Selects every atom in the system.

Ranges

GSL allows you to specify ranges of residue or atom numbers.

Specifying ranges

  • Using to:

    resid 14 to 20
    

    Selects atoms in residues numbered 14 through 20 (including 20).

  • Using -:

    resid 14-20
    

    Equivalent to resid 14 to 20.

Combining with multiple ranges and numbers

You can mix explicit numbers with ranges for more complex selections.

  • Example:

    serial 1 3 to 6 10 12-14 17
    

    Expands to selecting serial numbers 1, 3, 4, 5, 6, 10, 12, 13, 14, and 17.

Open-ended ranges

GSL supports open-ended ranges using comparison operators.

  • Operators:

    • < : Less than
    • > : Greater than
    • <=: Less than or equal to
    • >=: Greater than or equal to
  • Examples:

    serial <= 180
    

    Selects all atoms with serial numbers ≤ 180.

    resid > 33
    

    Selects atoms in residues numbered 34 and above.

Combining open-ended and explicit selections

  • Example:

    serial 1 3-6 >=20
    

    Selects atoms with serial numbers 1, 3, 4, 5, 6, and 20 or higher.

Negations

GSL allows you to exclude certain atoms from your selection using negation operators.

  • Syntax:

    not <query>
    

    Or:

    ! <query>
    
  • Examples:

    not name CA
    

    Selects all atoms not named CA.

    ! name CA
    

    Equivalent to the above.

    not resname POPE POPG
    

    Selects atoms not in residues named POPE or POPG.

    !Protein
    

    Selects atoms not in the group named "Protein".

Binary operations

GSL supports combining multiple queries using logical operators to refine selections further.

Logical AND

  • Operators: and or &&

  • Function: Selects atoms that satisfy both queries.

  • Examples:

    name P and resname POPE
    

    Selects atoms named P in residues named POPE.

    serial 256 to 271 && resid 17 18
    

    Selects atoms with serial numbers between 256 and 271 in residues 17 or 18.

Logical OR

  • Operators: or or ||

  • Function: Selects atoms that satisfy at least one of the queries. Can be understood as an addition (+) operation.

  • Examples:

    name P or resname POPE
    

    Selects atoms named P + atoms in residues POPE.

    resid 17 18 || serial 256 to 271
    

    Selects atoms in residues 17 or 18 + atoms with serial numbers between 256 and 271.

Operator precedence

When combining multiple and and or operators, GSL evaluates them from left to right.

  • Example:

    resname POPE or name CA and not Protein
    

    Interpreted as:
    (resname POPE or name CA) and not Protein

    Selects atoms in residues named POPE + atoms named CA (minus) atoms that are part of the "Protein" group.

Combining with autodetection macros

You can mix autodetection macros with other queries using logical operators.

  • Example:

    @membrane or group 'My Lipids'
    

    Selects all autodetected membrane lipids + atoms in the "My Lipids" group.

Parentheses

Parentheses allow you to control the evaluation order of your queries.

Changing evaluation order

Use parentheses to group queries and dictate the sequence of operations.

  • Example:

    resname POPE or (name CA and not resid 18 to 21)
    

    Selects atoms in residues named POPE + atoms named CA which are not in residues 18 to 21.

  • Changing the position of parentheses:

    (name CA or resname POPE) and not resid 18 to 21
    

    Selects atoms which are either named CA or in residues named POPE, but are not in residues 18 to 21.

Nested Parentheses

You can nest parentheses to create complex selection logic.

  • Example:

    serial 1 to 6 or (name CA and resname POPE || (resid 1 to 7 or serial 123 to 128)) and Protein
    

    A valid, albeit complex, query that combines multiple conditions.

Negating Parenthetical Expressions

You can apply negation to entire parenthetical groups.

  • Example:

    !(serial 1 to 6 && name P)
    

    Selects atoms that do not have serial numbers between 1 and 6 while being named P.

Selecting molecules

GSL provides operators to select entire molecules (i.e., groups of bonded atoms) based on specific criteria.

molecule with (or mol with) operator

  • Function: Selects all atoms in the same molecule(s) as the atoms matching the inner query.

  • Syntax:

    molecule with <query>
    

    Or:

    mol with <query>
    
  • Examples:

    molecule with serial 15
    

    Selects all atoms in the molecule containing the atom with serial number 15.

    molecule with resid 4 17 29
    

    Selects all atoms in molecules containing any atom from residues 4, 17, or 29.

    molecule with name P
    

    Selects all atoms in molecules that include an atom named P.

Operator precedence with molecule with

  • Example 1:

    molecule with serial 15 or name BB
    

    Selects atoms in the molecule containing serial 15 + selects atoms named BB.

  • Example 2:

    molecule with (serial 15 or name BB)
    

    Selects all atoms in molecules that contain either atom 15 or any atom named BB.

Note:

  • Topology requirement: The system must contain topology information to use molecule selections. Without it, no atoms will be selected. Topology information is available, for instance, in TPR files and some PDB files.

Labeling atoms

If atoms are labeled (using System::label_atom), you can use these labels in your GSL queries.

  • Syntax:

    label XYZ
    
  • Examples:

    label MyAtom
    

    Selects the atom labeled "MyAtom".

    label MyAtom AnotherAtom OneMoreAtom
    

    Selects atoms labeled "MyAtom", "AnotherAtom", and "OneMoreAtom".

Labels with multiple words

For labels consisting of multiple words, enclose them in quotes.

  • Example:

    label 'Very interesting atom'
    

    Selects the atom labeled "Very interesting atom".

Comparison with groups:
Labels are similar to groups but are guaranteed to contain only one atom each.

Regular expressions

GSL supports the use of regular expressions (regex).

Using regular expressions

You can apply regex patterns to various identifiers: atom names, residue names, group names, element names, element symbols, and labels.

  • Syntax:

    identifier r'<pattern>'
    
  • Examples:

    name r'^[1-9]?H.*'
    

    Selects all atoms with names matching the regex pattern (typically hydrogen atoms).

    resname r'^.*PC'
    

    Selects all residues with names ending in "PC".

    group r'^P'
    

    Selects all groups with names starting with 'P'.

Combining regular expressions with other identifiers

You can mix regex-based identifiers with standard identifiers in a single query.

  • Examples:

    name r'^C.*' and resname ALA GLY
    

    Selects atoms with names starting with 'C' in residues named ALA or GLY.

    name P r'^[1-9]?H.*'
    

    Selects atoms named 'P' and hydrogen atoms.

Syntax rules

  • Enclosure:
    The regex pattern must be enclosed within a "regular expression block" starting with r' and ending with '.

  • Case sensitivity:
    Regex patterns are case-sensitive unless specified otherwise within the pattern.

Supported regex features

GSL uses the regex crate for evaluating regular expressions. For detailed information on supported regex features, refer to the official documentation.

Important notes

Selecting Elements

  • Element assignment:
    Atoms in the system are not automatically assigned elements by groan_rs unless the elements are explicitly specified in the input structure file. This is because elements may not always be meaningful, particularly in coarse-grained systems.

  • Using element keywords:
    To use element name or element symbol in your selections, ensure that atoms have been assigned elements.

  • Assigning elements:

    • From topology files: Creating the System structure from a TPR file automatically assigns elements.
    • Guessing elements: Use the System::guess_elements function to assign elements based on atom names.
  • Recommendation:
    If you're using a program that utilizes GSL, ensure that it assigns elements to atoms before using element keywords in selections.

Whitespace considerations

  • Operator and parenthesis separation:
    Operators (and, or, not, etc.) and parentheses do not need to be separated by whitespace unless it affects query clarity.

  • Valid example:

    not(name CA)or(serial 1to45||Protein)
    

    A valid query without (unnecessary) whitespace.

  • Invalid example:

    not(name CA)or(serial 1to45orProtein)
    

    Invalid because orProtein is unclear.

  • Resolving ambiguity:

    Enclose ambiguous parts in parentheses to clarify intent. Or use whitespace, it is worth it.

    not(name CA)or(serial 1to45or(Protein))
    

    Now, the query is valid and interpretable, but not very human-readable.

Online GSL validator

You can easily verify the validity of your GSL query using the online validation tool.

Note: This tool checks only the general syntax of the query. It does not account for the specific context of your molecular system. Queries that reference non-existent groups in your system will still be considered valid syntactically but will fail during execution. The tool focuses solely on syntactical correctness and cannot interpret your intent. For example, the query resname POPC name P is syntactically valid but will probably not achieve the desired outcome.

Feedback and disclaimer

Have questions or encountered an issue? Open a GitHub issue for the groan_rs crate or send an email to ladmeb@gmail.com.

The groan_rs crate and GSL are currently unstable, and the language may undergo changes in future versions.

This guide was largely generated by a large language model (ChatGPT) based on the provided GSL specification.