`pa` command¶

The pa command is used to fit residual dipolar couplings (RDCs) and residual anisotropic chemical shifts (RACSs, sometimes known as RCSAs) from partially aligned samples using NMR. The output table entries are colored for warning outliers (yellow) and bad outliers (red).

$  ml pa --help
usage: mollib pa [-h] -i id/filename [id/filename ...] [-c filename] [-l] [-s]
                 [-m [MODELS [MODELS ...]]] [--hydrogenate] -d id/filename
                 [id/filename ...] [-o filename] [-p filename] [--summary]
                 [--set id]
                 [--exclude [interaction-type [interaction-type ...]]]
                 [--project-methyls] [--methyl-scale number]
                 [--fix-sign | --nofix-sign]
                 [--fix-nh-scale | --nofix-nh-scale]
                 [--fix-outliers | --nofix-outliers]

arguments:
  -h, --help            show this help message and exit
  -i id/filename [id/filename ...], --in id/filename [id/filename ...]
                        (required) The filename(s) or PDB identifier(s) of the
                        structure(s)
  -c filename, --config filename
                        The configuration filename
  -l                    List details on the molecule(s)
  -s, --save            Save fetched files to the local directory.
  -m [MODELS [MODELS ...]], --models [MODELS [MODELS ...]]
                        The models numbers to analyze.
  --hydrogenate         Strip hydrogens and re-add them before analysis
  -d id/filename [id/filename ...], --data id/filename [id/filename ...]
                        (required) Alignment file or identifier with RDC and
                        RACS data
  -o filename, --out filename
                        The output filename for the reports of the fit data.
  -p filename, --pred filename
                        The output filename for the report of the back-
                        calculated RDCs and RACSs that are not in the
                        experimental data.
  --summary             Only display the fit summary
  --set id              If multiple datasets are available, this option
                        specifies which dataset to use.
  --exclude [interaction-type [interaction-type ...]]
                        Exclude one or more interactions of the following
                        type(s). ex: N-H or CE-HE
  --project-methyls     Fit methyl RDCs by projecting their values on the
                        corresponding C-C bond, as used by Xplor-NIH
  --methyl-scale number
                        The order parameter to use in scaling the methyl RDCs.

fixer arguments:
  --fix-sign            Check and fix mistakes in RDC and RACS sign
  --nofix-sign          Disable check in RDC and RACS sign
  --fix-nh-scale        Check and rescale couplings that were scaled to the
                        N-H RDC.
  --nofix-nh-scale      Disable N-H rescaling of couplings.
  --fix-outliers        Fit without outliers
  --nofix-outliers      Disable fitting without outliers

Arguments¶

-d / --data filename

The file(s) with the RDC and RACS alignment data. These can be in either of the following formats:

The mollib data format. See Partial Alignment Data File Format.
NMRPipe’s DC format.
Magnetic resonance data files (.mr) in Xplor format submitted to the PDB. This function supports the automatic fetching and caching of magnetic resonance data files.

-o / --out filename

(Optional) The filename for the output report. The output report is rendered in Markdown.

-p / --pred filename

(Optional) The filename for the back-calculated RDCs and RACS from the SVD fit. The output report is rendered in Markdown.

--summary

(Optional) Only display the fit summary.

--exclude interactions-types

(Optional) Exclude one or more interactions types. ex:

--exclude
N-H CA-HA

will exclude all N-H and CA-HA RDCs.

--set id

(Optional) Use the given data set, if multiple data sets are available. This option is useful with .mr data from the PDB, which may contain mulitple alignment data sets from multiple alignment media. Sets can be selected from their alignment tensor value (ex: 500, 501, etc) or from their position within the data file, starting with 0. (ex: 0 for the first dataset, 1, for the second dataset and so on.) By default, the first dataset is used.

--project-methyls

(Optional) Use the C-C bond RDC values for the methyl ¹H-¹³C RDCs. This is the convention followed by X-plor NIH. By default, this is disabled.

--methyl-scale number

(Optional) The scaling constant to use in fitting the methyl RDCs. This scaling may be needed if the contribution of the C3-rotational motion was not accounted for in the reported RDCs. By default, this value is 1.0.

Note

The models option (-m/--models) will load the models as multiple molecules to be fit together in the SVD rather than conduct a separate SVD for each.

Fixer Arguments¶

--fix-sign / --nofix-sign: (Optional) Check to see if the sign of RDCs or RACSs of the same type need to be inverted to get a better fit. This operation is useful for automatically fixing the sign of couplings when the absolute value of the |J+D|- and |J|-couplings are used. By default, this fixer is on.
--fix-outliers / --nofix-outliers: (Optional) Check to see if there are outliers for each type of interaction. A warning outlier and a bad outlier are defined by those that give an alpha-critical cutoff of 95% and 99%, respectively, using a Grubbs test. If outliers are found, these will be removed from the fit and the reported statistics. By default, this fixer is off.
--fix-nh-scale / --nofix-nh-scale: (Optional) Check to see if RDCs and RACSs have been scaled to match the magnitude of N-H RDCs. If they have, scale them back down to their original values. By default, this fixer is off.

Partial Alignment Data File Format¶

The file format has the following features:

The interaction labels for dipolar interactions refer to two atoms (ex: 14N-H) and the interaction label for CSA interactions refer to one atom.
For dipolar interactions, redundant residue numbers and chain identifiers are not needed. For example, ‘14N-H’ and ‘14N-14H’ refer to the same dipole.
If the chain identifier is not specified, then the subunit ‘A’ is assumed.
Relative residue numbers are allowed. For example, ‘14N-C-1’ is the same as the ‘14N-13C’ dipole.
Errors are optional. If the error is not specified, a default value from the settings is used.

The partial alignment RDC and RACS data file has the following format:

# Interaction   Value (Hz)   Error (optional)
14N-H           -14.5        0.1
15N-H             3.5
A.16N-H          -8.5        0.2  # larger error

A.16H-A.15C       0.5        0.1
B.16H-B.15C       0.5        0.1

# Residual anisotropic chemical shift data
# Interaction   Value (ppb)   Error (optional)
5C                112         1
6C               -250

Examples¶

The following example fits the deposited RDCs for the hemagglutin fusion peptide structure (-a 2KXA) to the deposited NMR structure (-i 2KXA). The output table entries are colored for warning outliers (yellow) and bad outliers (red).

$  ml pa -i 2KXA -d 2KXA
Table: Summary SVD Statistics for Molecule 2KXA-1

---------- ---------------- --------------- ----------------- -------------- -----------
Overall    Q (%): 18.2      RMS: 2.0        count: 58                                   
N-H        Q (%): 9.3       RMS: 0.68       count: 21         Da (Hz): -8.0  Rh: 0.083  
CA-CB      Q (%): 27.1      RMS: 0.48       count: 2          Da (Hz): 1.4   Rh: 0.083  
CA-HA      Q (%): 16.7      RMS: 2.51       count: 26         Da (Hz): 16.4  Rh: 0.083  
CB-CG      Q (%): 3.6       RMS: -          count: 1                                    
CD-CG      Q (%): 43.4      RMS: 0.75       count: 3                                    
HA-HA      Q (%): 30.1      RMS: 5.34       count: 3                                    
NE-HE      Q (%): 12.8      RMS: 1.55       count: 2                                    
Alignment  Aa: -0.0003686   Ar: -3.053e-05                                              
Saupe      Szz: -0.0007373  Syy: 0.0003228  Sxx: 0.0004144                              
Angles     Z (deg): 230.1   Y' (deg): 83.5  Z'' (deg): 104.9                            
---------- ---------------- --------------- ----------------- -------------- -----------

Table: Observed and Predicted RDCs and RACS for Molecule 2KXA-1

Interaction   Value   Error  Predicted  Deviation  
------------- ------- ------ ---------- -----------
A.5CA-CB        -1.8  -        -1.5       -0.3     
A.7CA-CB         1.9  -         1.5        0.4     
A.1CA-HA2      -20.0  -       -13.5       -6.5     
A.1CA-HA3       -4.0  -        -3.1       -0.9     
A.2CA-HA        -1.8  -        -2.2        0.4     
A.3CA-HA        29.8  -        24.8        5.0     
A.4CA-HA#        5.0  -         3.0        2.0     
A.5CA-HA       -17.1  -       -14.7       -2.4     
A.6CA-HA        13.3  -        15.0       -1.7     
A.7CA-HA        13.2  -        15.0       -1.8     
A.8CA-HA#      -11.3  -        -9.2       -2.1     
A.9CA-HA       -16.8  -       -14.3       -2.5     
A.10CA-HA       30.6  -        27.8        2.8     
A.11CA-HA        0.6  -        -0.9        1.5     
A.12CA-HA2     -12.0  -       -13.6        1.6     
A.12CA-HA3      30.0  -        26.2        3.8     
A.13CA-HA2      -1.0  -         0.0       -1.0     
A.13CA-HA3       0.0  -         1.1       -1.1     
A.14CA-HA      -16.3  -       -17.1        0.8     
A.15CA-HA       -6.6  -       -10.9        4.3     
A.16CA-HA#      15.1  -        14.4        0.7     
A.17CA-HA       -9.8  -       -13.0        3.2     
A.18CA-HA      -17.9  -       -18.0        0.1     
A.19CA-HA        9.4  -         8.9        0.5     
A.20CA-HA#      23.7  -        23.0        0.7     
A.21CA-HA      -13.9  -       -15.1        1.2     
A.22CA-HA      -11.6  -       -11.7        0.1     
A.23CA-HA#       1.7  -         2.0       -0.3     
A.18CB-CG2       1.2  -         1.2        0.0     
A.6CD1-CG1!     -0.7  -        -1.3        0.6     
A.10CD1-CG1!    -0.2  -        -1.0        0.8     
A.18CD1-CG1!    -0.7  -        -1.1        0.4     
A.1HA2-HA3!    -10.0  -       -14.6        4.6     
A.12HA2-HA3!    15.0  -        10.8        4.2     
A.13HA2-HA3!   -11.0  -       -15.2        4.2     
A.3N-H         -10.5  -       -10.4       -0.1     
A.4N-H          -6.8  -        -6.5       -0.3     
A.5N-H           3.2  -         3.5       -0.3     
A.6N-H          -3.5  -        -3.1       -0.4     
A.7N-H          -7.8  -        -7.4       -0.4     
A.8N-H           0.4  -         1.2       -0.8     
A.9N-H           5.7  -         4.8        0.9     
A.10N-H         -4.0  -        -2.6       -1.4     
A.11N-H         -4.5  -        -3.2       -1.3     
A.12N-H          4.4  -         4.2        0.2     
A.13N-H          6.2  -         6.8       -0.6     
A.14N-H         -9.9  -       -10.6        0.7     
A.15N-H         -5.7  -        -6.1        0.4     
A.16N-H        -13.7  -       -14.6        0.9     
A.17N-H        -10.8  -       -11.1        0.3     
A.18N-H         -5.1  -        -5.4        0.3     
A.19N-H         -7.9  -        -8.9        1.0     
A.20N-H        -14.1  -       -14.5        0.4     
A.21N-H         -6.0  -        -5.6       -0.4     
A.22N-H         -3.8  -        -3.8        0.0     
A.23N-H        -14.1  -       -14.7        0.6     
A.14NE1-HE1     -5.7  -        -7.2        1.5     
A.21NE1-HE1      2.2  -         2.0        0.2     

* Inverting the sign of 'N-H' interactions improved the overall Q-factor from 228.7% to 29.4%.
* Inverting the sign of 'NE-HE' interactions improved the overall Q-factor from 29.4% to 18.2%.

The following example fits the deposited RDCs for the first alignment (--set 0) dataset of ubiquitin (-a 2MJB) to the deposited NMR structure (-i 2MJB). The RDCs for methyl groups are projected onto the corresponding C-C bonds (--project-methyls) and outliers are removed from the fit (--fix-outliers).

$  ml -s pa -i 2MJB -d 2MJB --set 0 --fix-outliers --project-methyls --summary
Table: Summary SVD Statistics for Molecule 2MJB-1

---------- --------------- ---------------- ----------------- --------------- -----------
Overall    Q (%): 23.8     RMS: 3.68        count: 477                                   
N-H        Q (%): 6.3      RMS: 0.52        count: 63         Da (Hz): 9.1    Rh: 0.144  
C-CA       Q (%): 19.4     RMS: 0.28        count: 58         Da (Hz): -1.6   Rh: 0.144  
C-H+1      Q (%): 13.0     RMS: 0.37        count: 61                                    
C-N+1      Q (%): 22.1     RMS: 0.19        count: 60         Da (Hz): 0.9    Rh: 0.144  
CA-CB      Q (%): 15.0     RMS: 0.4         count: 38         Da (Hz): -1.6   Rh: 0.144  
CA-HA      Q (%): 12.9     RMS: 2.21        count: 66         Da (Hz): -18.8  Rh: 0.144  
CB-CG      Q (%): 16.8     RMS: 2.93        count: 19                                    
CB-HB      Q (%): 18.1     RMS: 3.34        count: 50                                    
CD-CG      Q (%): 31.2     RMS: 5.44        count: 19                                    
CD-HD      Q (%): 49.5     RMS: 9.53        count: 10                                    
CE-HE      Q (%): 115.0    RMS: 23.44       count: 5                                     
CE-SD      Q (%): 74.1     RMS: -           count: 1                                     
CG-HG      Q (%): 43.1     RMS: 8.04        count: 27                                    
Alignment  Aa: 0.0004225   Ar: 6.07e-05                                                  
Saupe      Szz: 0.0008451  Syy: -0.0003315  Sxx: -0.0005136                              
Angles     Z (deg): 49.0   Y' (deg): 22.2   Z'' (deg): 104.8                             
---------- --------------- ---------------- ----------------- --------------- -----------

* Inverting the sign of 'N-H' interactions improved the overall Q-factor from 351.1% to 84.6%.
* Inverting the sign of 'C-N+1' interactions improved the overall Q-factor from 84.6% to 27.1%.
* Removing outlier data points A.46CA-CB, A.13CB-CG2, A.16CB-HB#, A.60CB-HB#, A.48CD-HD#, A.24C-25N
  improved the overall Q-factor from 27.1% to 23.8%.

This example is the same as the last one, however ‘CE-HE’, ‘CD-HD’ and ‘CE-SD’ RDCs are excluded (--exclude) from the fit.

$  ml -s pa -i 2MJB -d 2MJB --set 0 --exclude CE-HE CD-HD CE-SD --fix-outliers \
  --project-methyls --summary
Table: Summary SVD Statistics for Molecule 2MJB-1

---------- --------------- ---------------- ----------------- --------------- -----------
Overall    Q (%): 18.7     RMS: 2.71        count: 461                                   
N-H        Q (%): 5.4      RMS: 0.46        count: 63         Da (Hz): 9.3    Rh: 0.147  
C-CA       Q (%): 17.2     RMS: 0.25        count: 58         Da (Hz): -1.6   Rh: 0.147  
C-H+1      Q (%): 13.8     RMS: 0.4         count: 61                                    
C-N+1      Q (%): 19.6     RMS: 0.17        count: 60         Da (Hz): 1.0    Rh: 0.147  
CA-CB      Q (%): 14.4     RMS: 0.43        count: 38         Da (Hz): -1.6   Rh: 0.147  
CA-HA      Q (%): 10.9     RMS: 1.9         count: 66         Da (Hz): -19.3  Rh: 0.147  
CB-CG      Q (%): 16.1     RMS: 2.87        count: 19                                    
CB-HB      Q (%): 18.3     RMS: 3.45        count: 50                                    
CD-CG      Q (%): 31.8     RMS: 5.67        count: 19                                    
CG-HG      Q (%): 43.5     RMS: 8.29        count: 27                                    
Alignment  Aa: 0.0004318   Ar: 6.355e-05                                                 
Saupe      Szz: 0.0008637  Syy: -0.0003365  Sxx: -0.0005272                              
Angles     Z (deg): 48.1   Y' (deg): 21.8   Z'' (deg): 104.0                             
---------- --------------- ---------------- ----------------- --------------- -----------

* Inverting the sign of 'N-H' interactions improved the overall Q-factor from 367.2% to 83.2%.
* Inverting the sign of 'C-N+1' interactions improved the overall Q-factor from 83.2% to 21.1%.
* Removing outlier data points A.46CA-CB, A.13CB-CG2, A.16CB-HB#, A.60CB-HB#, A.24C-25N improved the
  overall Q-factor from 21.1% to 18.7%.

Likewise, the crystal structure of ubiquitin (-i 1UBQ) can be used in the fit. In this case, the structure is missing hydrogen atoms, and these must be added (--hydrogenate).

$  ml -s pa -i 1UBQ -d 2MJB --set 0 --fix-outliers --project-methyls \
  --hydrogenate --summary
Table: Summary SVD Statistics for Molecule 1UBQ

---------- --------------- --------------- ----------------- --------------- -----------
Overall    Q (%): 39.8     RMS: 5.77       count: 474                                   
N-H        Q (%): 14.7     RMS: 1.2        count: 62         Da (Hz): 8.9    Rh: 0.181  
C-CA       Q (%): 26.2     RMS: 0.37       count: 58         Da (Hz): -1.5   Rh: 0.181  
C-H+1      Q (%): 20.5     RMS: 0.56       count: 60                                    
C-N+1      Q (%): 31.1     RMS: 0.26       count: 61         Da (Hz): 0.9    Rh: 0.181  
CA-CB      Q (%): 25.1     RMS: 0.36       count: 37         Da (Hz): -1.5   Rh: 0.181  
CA-HA      Q (%): 25.8     RMS: 4.32       count: 66         Da (Hz): -18.4  Rh: 0.181  
CB-CG      Q (%): 24.4     RMS: 4.18       count: 18                                    
CB-HB      Q (%): 63.8     RMS: 10.42      count: 51                                    
CD-CG      Q (%): 55.1     RMS: 9.42       count: 19                                    
CD-HD      Q (%): 49.6     RMS: 8.45       count: 10                                    
CE-HE      Q (%): 19.7     RMS: 3.68       count: 4                                     
CE-SD      Q (%): 36.7     RMS: -          count: 1                                     
CG-HG      Q (%): 94.9     RMS: 15.63      count: 27                                    
Alignment  Aa: 0.000412    Ar: 7.466e-05                                                
Saupe      Szz: 0.0008239  Syy: -0.0003    Sxx: -0.0005239                              
Angles     Z (deg): 330.1  Y' (deg): 70.4  Z'' (deg): 257.7                             
---------- --------------- --------------- ----------------- --------------- -----------

* Inverting the sign of 'N-H' interactions improved the overall Q-factor from 516.4% to 93.7%.
* Inverting the sign of 'C-N+1' interactions improved the overall Q-factor from 93.7% to 46.3%.
* Removing outlier data points A.46CA-CB, A.14CB-CG2, A.48CD-HD#, A.33CE-HE#, A.48N-H, A.7C-8H,
  A.28CA-CB, A.13CB-CG2, A.14CB-HB improved the overall Q-factor from 46.3% to 39.8%.

Tensor Conventions¶

In the absence of motion, dipolar tensors are axially symmetric (i.e. \(\delta_{xx} = \delta_{yy}\)) and the principal component (\(\delta_{zz}\)) is colinear with the internuclear vector in the principal axis system (PAS).

Chemical shift tensors (CSA) may be axially asymmetric (i.e. \(\delta_{xx} \neq \delta_{yy}\)), and their geometries must be specified in relation to internal atomic coordinates. We use the convention from Cornilescu et al. [Cornilescu2000].

[Cornilescu2000]

Cornilescu, G. & Bax, A. Measurement of Proton, Nitrogen, and Carbonyl Chemical Shielding Anisotropies in a Protein Dissolved in a Dilute Liquid Crystalline Phase. J. Am. Chem. Soc. 122, 10143–10154 (2000).

The literature reports both the chemical shielding tensor (\(\sigma\)) and the chemical shift tensor (\(\delta\)). The difference between the two is an inversion of sign (i.e. \(\sigma = - \delta\)). As a result, the ordering of components between different conventions will change. In the Haeberlen convention, the chemical shift components are ordered by their magnitudes.

\[| \delta_{zz} | \geq | \delta_{xx} | \geq | \delta_{yy} |\]

The isotropic component (\(\delta_{iso}\)) has already been subtracted from the three components.

\[\delta_{iso} = \frac{1}{3} \left( \delta_{zz} + \delta_{xx} + \delta_{yy} \right)\]

Backbone CSA tensor conventions

In the IUPAC convention, the components are normally ordered starting from the largest component (with sign) as the ‘11’ component. However, for chemical shielding tensors, the ‘33’ component is largest.

\[\sigma_{33} \geq \sigma_{22} \geq \sigma_{11}\]

Mollib uses the Haeberlen convention and chemical shift tensors. The backbone H, C’ and N CSA tensors are defined as follows:

The ¹³C’ tensor (blue) has the largest component (\(\delta_{zz}\)) oriented orthogonal to the O-C-N plane, and it is rotated about this component by the \(\alpha_z\) angle.

The ¹⁵N tensor (red) has the largest component (\(\delta_{zz}\)) nearly colinear with the H-N bond, and it is rotated away from the bond about the yy-component (orthogonal to the H-N-C’ plane) with an angle \(\beta_y\).

The ¹H tensor (green) has the largest component (\(\delta_{zz}\)) nearly colinear with the H-N bond, and it is rotated about the xx-component (orthogonal to the H-N-C’ plane) by an angle \(\gamma_x\).

pa command¶