Introduction

Groan Selection Language (GSL) is a query language for selecting atoms in the groan_rs crate and in programs built on top of it, such as gorder or gcenter.

The selection language is quite similar to the languages used by VMD and MDAnalysis, so if you are familiar with them, GSL should feel quite intuitive. This guide should help you understand and efficiently use GSL both when working directly with the groan_rs library and when using applications built on top of it.


This document describes the syntax of GSL v0.9 (corresponding to groan_rs v0.9 and v0.10).

Basic queries

GSL allows you to select atoms based on their attributes. Here's how you can perform basic selections:

1. Residue names

Select atoms belonging to residues with specific names.

  • Syntax: resname XYZ
  • Example: resname POPE

Selects all atoms in residues named POPE.

2. Residue numbers

Select atoms based on their residue numbers.

  • Syntax: resid XYZ or resnum XYZ
  • Example: resid 17

Selects all atoms in residue number 17.

3. Atom names

Select atoms with specific atom names.

  • Syntax: name XYZ or atomname XYZ
  • Example: name P

Selects all atoms named P.

4. Serial atom numbers

Select atoms based on their serial atom numbers as understood by GROMACS.

  • Syntax: serial XYZ
  • Example: serial 256

Selects the atom with serial atom number 256.

Note: This selection is determined using GROMACS's internal atom numbering, not the numbering found in GRO or PDB files. Here, numbering starts at 1 for the first atom and increases sequentially. Each atom is guaranteed to a have a unique serial number.

5. GRO/PDB atom numbers

Select atoms based on their atom numbers in the originating GRO or PDB file.

  • Syntax: atomnum XYZ
  • Example: atomnum 124

Selects all atoms with atom number 124 in the input file.

Note: There is no guarantee that one specific atom number will be assigned to exactly one atom.

6. Chain identifiers

Select atoms belonging to specific chains.

  • Syntax: chain X

  • Example: chain A

    Selects all atoms in chain 'A'.

Note: Chain information is typically available in PDB files. If absent, this selection will yield no results.

7. Element names

Select atoms based on their element names.

  • Syntax: element name XYZ or elname XYZ
  • Example: element name carbon

Selects all carbon atoms.

Note: Element information must be available. If absent, this selection will yield no results. Read more here.

8. Element symbols

Select atoms based on their element symbols.

  • Syntax: element symbol X or elsymbol X
  • Example: element symbol C

Selects all carbon atoms.

Note: Element information must be available. If absent, this selection will yield no results. Read more here.

9. Labels

Select atoms using predefined labels.

  • Syntax: label XYZ

Labels are discussed in detail later in this tutorial.

Multiple identifiers

GSL allows you to specify multiple identifiers within a single query, enabling the selection of atoms matching any of the provided criteria.

Examples:

  • Residue names:

    resname POPE POPG
    

    Selects all atoms in residues named POPE or POPG.

  • Residue numbers:

    resid 13 15 16 17
    

    Selects atoms in residues numbered 13, 15, 16, or 17.

  • Atom names:

    name P CA HA
    

    Selects atoms named P, CA, or HA.

  • Serial numbers:

    serial 245 267 269 271
    

    Selects atoms with serial numbers 245, 267, 269, or 271.

  • Chains:

    chain A B C
    

    Selects atoms in chains 'A', 'B', or 'C'.

  • Element names:

    elname carbon hydrogen
    

    Selects all carbon and hydrogen atoms.

  • Element symbols:

    elsymbol C H
    

    Selects all carbon and hydrogen atoms.

Selecting atoms using groups

GSL allows you to select atoms using the previously constructed groups of atoms.

Creating groups

Before selecting using groups, you must construct them. This is typically done within your workflow or by loading predefined groups, e.g. from NDX files.

Selecting groups

  • Syntax:

    group GroupName1 GroupName2
    

    Or simply:

    GroupName1 GroupName2
    
  • Example:

    Protein Membrane
    

    Selects all atoms in the groups named "Protein" and "Membrane".

Important: If a specified group does not exist, an error will be raised.

Loading groups from NDX files

If you've loaded an NDX file using System::read_ndx, you can utilize groups defined within that file.

Note: If a program using groan_rs loads an NDX file, you will typically be able to select groups defined in the NDX file.

Groups with multiple words

For groups with names comprising multiple words, enclose the group name in quotes.

  • Example:

    Protein 'My Lipids'
    

    Selects atoms in the "Protein" group and the "My Lipids" group.

Selecting atoms by autodetection

GSL offers macros that automatically detect and select common molecular components. These macros simplify selections without needing to specify detailed criteria. GSL macros support atomistic, united-atom, and coarse-grained systems.

Available macros:

  • @protein: Selects all amino acid atoms (supports ~140 different amino acids).
  • @water: Selects all water atoms.
  • @ion: Selects all ion atoms.
  • @membrane: Selects membrane lipid atoms (supports over 200 lipid types).
  • @dna: Selects all DNA molecule atoms.
  • @rna: Selects all RNA molecule atoms.

Example usage:

@protein or @membrane

Selects all protein and membrane lipid atoms.

Caution: While these macros are generally reliable, they may not always perfectly identify all relevant atoms. Use them judiciously, especially when working with custom molecules.

Macro definitions:

The following list shows how GSL macros currently expand. Note that these definitions may change in future versions to support additional atom matching:

  • @protein = resname ABU ACE AIB ALA ARG ARGN ASN ASN1 ASP ASP1 ASPH ASPP ASH CT3 CYS CYS1 CYS2 CYSH DALA GLN GLU GLUH GLUP GLH GLY HIS HIS1 HISA HISB HISH HISD HISE HISP HSD HSE HSP HYP ILE LEU LSN LYS LYSN LYSH MELEU MET MEVAL NAC NME NHE NH2 PHE PHEH PHEU PHL PRO SER THR TRP TRPH TRPU TYR TYRH TYRU VAL PGLU HID HIE HIP LYP LYN CYN CYM CYX DAB ORN HYP NALA NGLY NSER NTHR NLEU NILE NVAL NASN NGLN NARG NHID NHIE NHIP NHISD NHISE NHISH NTRP NPHE NTYR NGLU NASP NLYS NORN NDAB NLYSN NPRO NHYP NCYS NCYS2 NMET NASPH NGLUH CALA CGLY CSER CTHR CLEU CILE CVAL CASN CGLN CARG CHID CHIE CHIP CHISD CHISE CHISH CTRP CPHE CTYR CGLU CASP CLYS CORN CDAB CLYSN CPRO CHYP CCYS CCYS2 CMET CASPH CGLUH
  • @water = name W OW HW1 HW2 OH2 H1 H2 and resname SOL WAT HOH OHH TIP T3P T4P T5P T3H W TIP3 TIP4 SPC SPCE
  • @ion = name NA NA+ CL CL- K K+ SOD CLA CA CA2+ MG ZN CU1 CU LI RB CS F BR I OH Cal CAL IB+ and resname ION NA NA+ CL CL- K K+ SOD CLA CA CA2+ MG ZN CU1 CU LI RB CS F BR I OH Cal CAL IB+
  • @membrane = resname DAPC DBPC DFPC DGPC DIPC DLPC DNPC DOPC DPPC DUPC DRPC DTPC DVPC DXPC DYPC LPPC PAPC PEPC PGPC PIPC POPC PRPC PUPC DAPE DBPE DFPE DGPE DIPE DLPE DNPE DOPE DPPE DRPE DTPE DUPE DVPE DXPE DYPE LPPE PAPE PGPE PIPE POPE PQPE PRPE PUPE DAPS DBPS DFPS DGPS DIPS DLPS DNPS DOPS DPPS DRPS DTPS DUPS DVPS DXPS DYPS LPPS PAPS PGPS PIPS POPS PQPS PRPS PUPS DAPG DBPG DFPG DGPG DIPG DLPG DNPG DOPG DPPG DRPG DTPG DVPG DXPG DYPG JFPG JPPG LPPG OPPG PAPG PGPG PIPG POPG PRPG DAPA DBPA DFPA DGPA DIPA DLPA DNPA DOPA DPPA DRPA DTPA DVPA DXPA DYPA LPPA PAPA PGPA PIPA POPA PRPA PUPA DPP1 DPP2 DPPI PAPI PIPI POP1 POP2 POP3 POPI PUPI PVP1 PVP2 PVP3 PVPI PADG PIDG PODG PUDG PVDG TOG APC CPC IPC LPC OPC PPC TPC UPC VPC BNSM DBSM DPSM DXSM PGSM PNSM POSM PVSM XNSM DPCE DXCE PNCE XNCE DBG1 DPG1 DPG3 DPGS DXG1 DXG3 PNG1 PNG3 XNG1 XNG3 DFGG DFMG DPGG DPMG DPSG FPGG FPMG FPSG OPGG OPMG OPSG CHOA CHOL CHYO BOG DDM DPC EO5 SDS BOLA BOLB CDL0 CDL1 CDL2 CDL DBG3 ERGO HBHT HDPT HHOP HOPR ACA ACN BCA BCN LCA LCN PCA PCN UCA UCN XCA XCN RAMP REMP OANT
  • @dna = resname DA DG DC DT DA5 DG5 DC5 DT5 DA3 DG3 DC3 DT3 DAN DGN DCN DTN
  • @rna = resname A U C G RA RU RC RG RA5 RT5 RU5 RC5 RG5 RA3 RT3 RU3 RC3 RG3 RAN RTN RUN RCN RGN

Selecting all atoms

Sometimes, you may want to select the entire set of atoms in your system.

Syntax:

all

Selects every atom in the system.

Ranges

GSL allows you to specify ranges of residue or atom numbers.

Specifying ranges

  • Using to:

    resid 14 to 20
    

    Selects atoms in residues numbered 14 through 20 (including 20).

  • Using -:

    resid 14-20
    

    Equivalent to resid 14 to 20.

Combining with multiple ranges and numbers

You can mix explicit numbers with ranges for more complex selections.

  • Example:

    serial 1 3 to 6 10 12-14 17
    

    Expands to selecting serial numbers 1, 3, 4, 5, 6, 10, 12, 13, 14, and 17.

Open-ended ranges

GSL supports open-ended ranges using comparison operators.

  • Operators:

    • < : Less than
    • > : Greater than
    • <=: Less than or equal to
    • >=: Greater than or equal to
  • Examples:

    serial <= 180
    

    Selects all atoms with serial numbers ≤ 180.

    resid > 33
    

    Selects atoms in residues numbered 34 and above.

Combining open-ended and explicit selections

  • Example:

    serial 1 3-6 >=20
    

    Selects atoms with serial numbers 1, 3, 4, 5, 6, and 20 or higher.

Negations

GSL allows you to exclude certain atoms from your selection using negation operators.

  • Syntax:

    not <query>
    

    Or:

    ! <query>
    
  • Examples:

    not name CA
    

    Selects all atoms not named CA.

    ! name CA
    

    Equivalent to the above.

    not resname POPE POPG
    

    Selects atoms not in residues named POPE or POPG.

    !Protein
    

    Selects atoms not in the group named "Protein".

Binary operations

GSL supports combining multiple queries using logical operators to refine selections further.

Logical AND

  • Operators: and or &&

  • Function: Selects atoms that satisfy both queries.

  • Examples:

    name P and resname POPE
    

    Selects atoms named P in residues named POPE.

    serial 256 to 271 && resid 17 18
    

    Selects atoms with serial numbers between 256 and 271 in residues 17 or 18.

Logical OR

  • Operators: or or ||

  • Function: Selects atoms that satisfy at least one of the queries. Can be understood as an addition (+) operation.

  • Examples:

    name P or resname POPE
    

    Selects atoms named P + atoms in residues POPE.

    resid 17 18 || serial 256 to 271
    

    Selects atoms in residues 17 or 18 + atoms with serial numbers between 256 and 271.

Operator precedence

When combining multiple and and or operators, GSL evaluates them from left to right.

  • Example:

    resname POPE or name CA and not Protein
    

    Interpreted as:
    (resname POPE or name CA) and not Protein

    Selects atoms in residues named POPE + atoms named CA (minus) atoms that are part of the "Protein" group.

Combining with autodetection macros

You can mix autodetection macros with other queries using logical operators.

  • Example:

    @membrane or group 'My Lipids'
    

    Selects all autodetected membrane lipids + atoms in the "My Lipids" group.

Parentheses

Parentheses allow you to control the evaluation order of your queries.

Changing evaluation order

Use parentheses to group queries and dictate the sequence of operations.

  • Example:

    resname POPE or (name CA and not resid 18 to 21)
    

    Selects atoms in residues named POPE + atoms named CA which are not in residues 18 to 21.

  • Changing the position of parentheses:

    (name CA or resname POPE) and not resid 18 to 21
    

    Selects atoms which are either named CA or in residues named POPE, but are not in residues 18 to 21.

Nested Parentheses

You can nest parentheses to create complex selection logic.

  • Example:

    serial 1 to 6 or (name CA and resname POPE || (resid 1 to 7 or serial 123 to 128)) and Protein
    

    A valid, albeit complex, query that combines multiple conditions.

Negating Parenthetical Expressions

You can apply negation to entire parenthetical groups.

  • Example:

    !(serial 1 to 6 && name P)
    

    Selects atoms that do not have serial numbers between 1 and 6 while being named P.

Selecting molecules

GSL provides operators to select entire molecules (i.e., groups of bonded atoms) based on specific criteria.

molecule with (or mol with) operator

  • Function: Selects all atoms in the same molecule(s) as the atoms matching the inner query.

  • Syntax:

    molecule with <query>
    

    Or:

    mol with <query>
    
  • Examples:

    molecule with serial 15
    

    Selects all atoms in the molecule containing the atom with serial number 15.

    molecule with resid 4 17 29
    

    Selects all atoms in molecules containing any atom from residues 4, 17, or 29.

    molecule with name P
    

    Selects all atoms in molecules that include an atom named P.

Operator precedence with molecule with

  • Example 1:

    molecule with serial 15 or name BB
    

    Selects atoms in the molecule containing serial 15 + selects atoms named BB.

  • Example 2:

    molecule with (serial 15 or name BB)
    

    Selects all atoms in molecules that contain either atom 15 or any atom named BB.

Note:

  • Topology requirement: The system must contain topology information to use molecule selections. Without it, no atoms will be selected. Topology information is available, for instance, in TPR files and some PDB files.

Labeling atoms

If atoms are labeled (using System::label_atom), you can use these labels in your GSL queries.

  • Syntax:

    label XYZ
    
  • Examples:

    label MyAtom
    

    Selects the atom labeled "MyAtom".

    label MyAtom AnotherAtom OneMoreAtom
    

    Selects atoms labeled "MyAtom", "AnotherAtom", and "OneMoreAtom".

Labels with multiple words

For labels consisting of multiple words, enclose them in quotes.

  • Example:

    label 'Very interesting atom'
    

    Selects the atom labeled "Very interesting atom".

Comparison with groups:
Labels are similar to groups but are guaranteed to contain only one atom each.

Regular expressions

GSL supports the use of regular expressions (regex).

Using regular expressions

You can apply regex patterns to various identifiers: atom names, residue names, group names, element names, element symbols, and labels.

  • Syntax:

    identifier r'<pattern>'
    
  • Examples:

    name r'^[1-9]?H.*'
    

    Selects all atoms with names matching the regex pattern (typically hydrogen atoms).

    resname r'^.*PC'
    

    Selects all residues with names ending in "PC".

    group r'^P'
    

    Selects all groups with names starting with 'P'.

Combining regular expressions with other identifiers

You can mix regex-based identifiers with standard identifiers in a single query.

  • Examples:

    name r'^C.*' and resname ALA GLY
    

    Selects atoms with names starting with 'C' in residues named ALA or GLY.

    name P r'^[1-9]?H.*'
    

    Selects atoms named 'P' and hydrogen atoms.

Syntax rules

  • Enclosure:
    The regex pattern must be enclosed within a "regular expression block" starting with r' and ending with '.

  • Case sensitivity:
    Regex patterns are case-sensitive unless specified otherwise within the pattern.

Supported regex features

GSL uses the regex crate for evaluating regular expressions. For detailed information on supported regex features, refer to the official documentation.

Important notes

Selecting Elements

  • Element assignment:
    Atoms in the system are not automatically assigned elements by groan_rs unless the elements are explicitly specified in the input structure file. This is because elements may not always be meaningful, particularly in coarse-grained systems.

  • Using element keywords:
    To use element name or element symbol in your selections, ensure that atoms have been assigned elements.

  • Assigning elements:

    • From topology files: Creating the System structure from a TPR file automatically assigns elements.
    • Guessing elements: Use the System::guess_elements function to assign elements based on atom names.
  • Recommendation:
    If you're using a program that utilizes GSL, ensure that it assigns elements to atoms before using element keywords in selections.

Whitespace considerations

  • Operator and parenthesis separation:
    Operators (and, or, not, etc.) and parentheses do not need to be separated by whitespace unless it affects query clarity.

  • Valid example:

    not(name CA)or(serial 1to45||Protein)
    

    A valid query without (unnecessary) whitespace.

  • Invalid example:

    not(name CA)or(serial 1to45orProtein)
    

    Invalid because orProtein is unclear.

  • Resolving ambiguity:

    Enclose ambiguous parts in parentheses to clarify intent. Or use whitespace, it is worth it.

    not(name CA)or(serial 1to45or(Protein))
    

    Now, the query is valid and interpretable, but not very human-readable.

Online GSL validator

You can easily verify the validity of your GSL query using the online validation tool.

Note: This tool checks only the general syntax of the query. It does not account for the specific context of your molecular system. Queries that reference non-existent groups in your system will still be considered valid syntactically but will fail during execution. The tool focuses solely on syntactical correctness and cannot interpret your intent. For example, the query resname POPC name P is syntactically valid but will probably not achieve the desired outcome.

Feedback and disclaimer

Have questions or encountered an issue? Open a GitHub issue for the groan_rs crate or send an email to ladmeb@gmail.com.

The groan_rs crate and GSL are currently unstable, and the language may undergo changes in future versions.

This guide was largely generated by a large language model (ChatGPT) based on the provided GSL specification.