pa
command¶
The pa
command is used to fit residual dipolar couplings (RDCs) and residual
anisotropic chemical shifts (RACSs, sometimes known as RCSAs) from partially
aligned samples using NMR. The output table entries are colored for warning
outliers (yellow) and bad outliers (red).
$ ml pa --help
usage: mollib pa [-h] -i id/filename [id/filename ...] [-c filename] [-l] [-s]
[-m [MODELS [MODELS ...]]] [--hydrogenate] -d id/filename
[id/filename ...] [-o filename] [-p filename] [--summary]
[--set id]
[--exclude [interaction-type [interaction-type ...]]]
[--project-methyls] [--methyl-scale number]
[--fix-sign | --nofix-sign]
[--fix-nh-scale | --nofix-nh-scale]
[--fix-outliers | --nofix-outliers]
arguments:
-h, --help show this help message and exit
-i id/filename [id/filename ...], --in id/filename [id/filename ...]
(required) The filename(s) or PDB identifier(s) of the
structure(s)
-c filename, --config filename
The configuration filename
-l List details on the molecule(s)
-s, --save Save fetched files to the local directory.
-m [MODELS [MODELS ...]], --models [MODELS [MODELS ...]]
The models numbers to analyze.
--hydrogenate Strip hydrogens and re-add them before analysis
-d id/filename [id/filename ...], --data id/filename [id/filename ...]
(required) Alignment file or identifier with RDC and
RACS data
-o filename, --out filename
The output filename for the reports of the fit data.
-p filename, --pred filename
The output filename for the report of the back-
calculated RDCs and RACSs that are not in the
experimental data.
--summary Only display the fit summary
--set id If multiple datasets are available, this option
specifies which dataset to use.
--exclude [interaction-type [interaction-type ...]]
Exclude one or more interactions of the following
type(s). ex: N-H or CE-HE
--project-methyls Fit methyl RDCs by projecting their values on the
corresponding C-C bond, as used by Xplor-NIH
--methyl-scale number
The order parameter to use in scaling the methyl RDCs.
fixer arguments:
--fix-sign Check and fix mistakes in RDC and RACS sign
--nofix-sign Disable check in RDC and RACS sign
--fix-nh-scale Check and rescale couplings that were scaled to the
N-H RDC.
--nofix-nh-scale Disable N-H rescaling of couplings.
--fix-outliers Fit without outliers
--nofix-outliers Disable fitting without outliers
Arguments¶
-d
/--data
filename
The file(s) with the RDC and RACS alignment data. These can be in either of the following formats:
- The mollib data format. See Partial Alignment Data File Format.
- NMRPipe’s DC format.
- Magnetic resonance data files (
.mr
) in Xplor format submitted to the PDB. This function supports the automatic fetching and caching of magnetic resonance data files.
-o
/--out
filename
- (Optional) The filename for the output report. The output report is rendered in Markdown.
-p
/--pred
filename
- (Optional) The filename for the back-calculated RDCs and RACS from the SVD fit. The output report is rendered in Markdown.
--summary
- (Optional) Only display the fit summary.
--exclude
interactions-types
- (Optional) Exclude one or more interactions types. ex:
--exclude N-H CA-HA
will exclude all N-H and CA-HA RDCs. --set
id
- (Optional) Use the given data set, if multiple data sets are available.
This option is useful with
.mr
data from the PDB, which may contain mulitple alignment data sets from multiple alignment media. Sets can be selected from their alignment tensor value (ex: 500, 501, etc) or from their position within the data file, starting with 0. (ex: 0 for the first dataset, 1, for the second dataset and so on.) By default, the first dataset is used. --project-methyls
- (Optional) Use the C-C bond RDC values for the methyl ¹H-¹³C RDCs. This is the convention followed by X-plor NIH. By default, this is disabled.
--methyl-scale
number
- (Optional) The scaling constant to use in fitting the methyl RDCs. This scaling may be needed if the contribution of the C3-rotational motion was not accounted for in the reported RDCs. By default, this value is 1.0.
Note
The models option (-m
/--models
) will load the models as
multiple molecules to be fit together in the SVD rather than conduct
a separate SVD for each.
Fixer Arguments¶
--fix-sign
/--nofix-sign
- (Optional) Check to see if the sign of RDCs or RACSs of the same type need to be inverted to get a better fit. This operation is useful for automatically fixing the sign of couplings when the absolute value of the |J+D|- and |J|-couplings are used. By default, this fixer is on.
--fix-outliers
/--nofix-outliers
- (Optional) Check to see if there are outliers for each type of interaction. A warning outlier and a bad outlier are defined by those that give an alpha-critical cutoff of 95% and 99%, respectively, using a Grubbs test. If outliers are found, these will be removed from the fit and the reported statistics. By default, this fixer is off.
--fix-nh-scale
/--nofix-nh-scale
- (Optional) Check to see if RDCs and RACSs have been scaled to match the magnitude of N-H RDCs. If they have, scale them back down to their original values. By default, this fixer is off.
Partial Alignment Data File Format¶
The file format has the following features:
- The interaction labels for dipolar interactions refer to two atoms (ex: 14N-H) and the interaction label for CSA interactions refer to one atom.
- For dipolar interactions, redundant residue numbers and chain identifiers are not needed. For example, ‘14N-H’ and ‘14N-14H’ refer to the same dipole.
- If the chain identifier is not specified, then the subunit ‘A’ is assumed.
- Relative residue numbers are allowed. For example, ‘14N-C-1’ is the same as the ‘14N-13C’ dipole.
- Errors are optional. If the error is not specified, a default value from the settings is used.
The partial alignment RDC and RACS data file has the following format:
# Interaction Value (Hz) Error (optional)
14N-H -14.5 0.1
15N-H 3.5
A.16N-H -8.5 0.2 # larger error
A.16H-A.15C 0.5 0.1
B.16H-B.15C 0.5 0.1
# Residual anisotropic chemical shift data
# Interaction Value (ppb) Error (optional)
5C 112 1
6C -250
Examples¶
The following example fits the deposited RDCs for the hemagglutin fusion
peptide structure (-a 2KXA
) to the deposited NMR structure
(-i 2KXA
). The output table entries are colored for warning outliers
(yellow) and bad outliers (red).
$ ml pa -i 2KXA -d 2KXA
Table: Summary SVD Statistics for Molecule 2KXA-1
---------- ---------------- --------------- ----------------- -------------- -----------
Overall Q (%): 18.2 RMS: 2.0 count: 58
N-H Q (%): 9.3 RMS: 0.68 count: 21 Da (Hz): -8.0 Rh: 0.083
CA-CB Q (%): 27.1 RMS: 0.48 count: 2 Da (Hz): 1.4 Rh: 0.083
CA-HA Q (%): 16.7 RMS: 2.51 count: 26 Da (Hz): 16.4 Rh: 0.083
CB-CG Q (%): 3.6 RMS: - count: 1
CD-CG Q (%): 43.4 RMS: 0.75 count: 3
HA-HA Q (%): 30.1 RMS: 5.34 count: 3
NE-HE Q (%): 12.8 RMS: 1.55 count: 2
Alignment Aa: -0.0003686 Ar: -3.053e-05
Saupe Szz: -0.0007373 Syy: 0.0003228 Sxx: 0.0004144
Angles Z (deg): 230.1 Y' (deg): 83.5 Z'' (deg): 104.9
---------- ---------------- --------------- ----------------- -------------- -----------
Table: Observed and Predicted RDCs and RACS for Molecule 2KXA-1
Interaction Value Error Predicted Deviation
------------- ------- ------ ---------- -----------
A.5CA-CB -1.8 - -1.5 -0.3
A.7CA-CB 1.9 - 1.5 0.4
A.1CA-HA2 -20.0 - -13.5 -6.5
A.1CA-HA3 -4.0 - -3.1 -0.9
A.2CA-HA -1.8 - -2.2 0.4
A.3CA-HA 29.8 - 24.8 5.0
A.4CA-HA# 5.0 - 3.0 2.0
A.5CA-HA -17.1 - -14.7 -2.4
A.6CA-HA 13.3 - 15.0 -1.7
A.7CA-HA 13.2 - 15.0 -1.8
A.8CA-HA# -11.3 - -9.2 -2.1
A.9CA-HA -16.8 - -14.3 -2.5
A.10CA-HA 30.6 - 27.8 2.8
A.11CA-HA 0.6 - -0.9 1.5
A.12CA-HA2 -12.0 - -13.6 1.6
A.12CA-HA3 30.0 - 26.2 3.8
A.13CA-HA2 -1.0 - 0.0 -1.0
A.13CA-HA3 0.0 - 1.1 -1.1
A.14CA-HA -16.3 - -17.1 0.8
A.15CA-HA -6.6 - -10.9 4.3
A.16CA-HA# 15.1 - 14.4 0.7
A.17CA-HA -9.8 - -13.0 3.2
A.18CA-HA -17.9 - -18.0 0.1
A.19CA-HA 9.4 - 8.9 0.5
A.20CA-HA# 23.7 - 23.0 0.7
A.21CA-HA -13.9 - -15.1 1.2
A.22CA-HA -11.6 - -11.7 0.1
A.23CA-HA# 1.7 - 2.0 -0.3
A.18CB-CG2 1.2 - 1.2 0.0
A.6CD1-CG1! -0.7 - -1.3 0.6
A.10CD1-CG1! -0.2 - -1.0 0.8
A.18CD1-CG1! -0.7 - -1.1 0.4
A.1HA2-HA3! -10.0 - -14.6 4.6
A.12HA2-HA3! 15.0 - 10.8 4.2
A.13HA2-HA3! -11.0 - -15.2 4.2
A.3N-H -10.5 - -10.4 -0.1
A.4N-H -6.8 - -6.5 -0.3
A.5N-H 3.2 - 3.5 -0.3
A.6N-H -3.5 - -3.1 -0.4
A.7N-H -7.8 - -7.4 -0.4
A.8N-H 0.4 - 1.2 -0.8
A.9N-H 5.7 - 4.8 0.9
A.10N-H -4.0 - -2.6 -1.4
A.11N-H -4.5 - -3.2 -1.3
A.12N-H 4.4 - 4.2 0.2
A.13N-H 6.2 - 6.8 -0.6
A.14N-H -9.9 - -10.6 0.7
A.15N-H -5.7 - -6.1 0.4
A.16N-H -13.7 - -14.6 0.9
A.17N-H -10.8 - -11.1 0.3
A.18N-H -5.1 - -5.4 0.3
A.19N-H -7.9 - -8.9 1.0
A.20N-H -14.1 - -14.5 0.4
A.21N-H -6.0 - -5.6 -0.4
A.22N-H -3.8 - -3.8 0.0
A.23N-H -14.1 - -14.7 0.6
A.14NE1-HE1 -5.7 - -7.2 1.5
A.21NE1-HE1 2.2 - 2.0 0.2
* Inverting the sign of 'N-H' interactions improved the overall Q-factor from 228.7% to 29.4%.
* Inverting the sign of 'NE-HE' interactions improved the overall Q-factor from 29.4% to 18.2%.
The following example fits the deposited RDCs for the first alignment
(--set 0
) dataset of ubiquitin (-a 2MJB
) to the deposited NMR structure
(-i 2MJB
). The RDCs for methyl groups are projected onto the
corresponding C-C bonds (--project-methyls
) and outliers are removed
from the fit (--fix-outliers
).
$ ml -s pa -i 2MJB -d 2MJB --set 0 --fix-outliers --project-methyls --summary
Table: Summary SVD Statistics for Molecule 2MJB-1
---------- --------------- ---------------- ----------------- --------------- -----------
Overall Q (%): 23.8 RMS: 3.68 count: 477
N-H Q (%): 6.3 RMS: 0.52 count: 63 Da (Hz): 9.1 Rh: 0.144
C-CA Q (%): 19.4 RMS: 0.28 count: 58 Da (Hz): -1.6 Rh: 0.144
C-H+1 Q (%): 13.0 RMS: 0.37 count: 61
C-N+1 Q (%): 22.1 RMS: 0.19 count: 60 Da (Hz): 0.9 Rh: 0.144
CA-CB Q (%): 15.0 RMS: 0.4 count: 38 Da (Hz): -1.6 Rh: 0.144
CA-HA Q (%): 12.9 RMS: 2.21 count: 66 Da (Hz): -18.8 Rh: 0.144
CB-CG Q (%): 16.8 RMS: 2.93 count: 19
CB-HB Q (%): 18.1 RMS: 3.34 count: 50
CD-CG Q (%): 31.2 RMS: 5.44 count: 19
CD-HD Q (%): 49.5 RMS: 9.53 count: 10
CE-HE Q (%): 115.0 RMS: 23.44 count: 5
CE-SD Q (%): 74.1 RMS: - count: 1
CG-HG Q (%): 43.1 RMS: 8.04 count: 27
Alignment Aa: 0.0004225 Ar: 6.07e-05
Saupe Szz: 0.0008451 Syy: -0.0003315 Sxx: -0.0005136
Angles Z (deg): 49.0 Y' (deg): 22.2 Z'' (deg): 104.8
---------- --------------- ---------------- ----------------- --------------- -----------
* Inverting the sign of 'N-H' interactions improved the overall Q-factor from 351.1% to 84.6%.
* Inverting the sign of 'C-N+1' interactions improved the overall Q-factor from 84.6% to 27.1%.
* Removing outlier data points A.46CA-CB, A.13CB-CG2, A.16CB-HB#, A.60CB-HB#, A.48CD-HD#, A.24C-25N
improved the overall Q-factor from 27.1% to 23.8%.
This example is the same as the last one, however ‘CE-HE’, ‘CD-HD’ and ‘CE-SD’
RDCs are excluded (--exclude
) from the fit.
$ ml -s pa -i 2MJB -d 2MJB --set 0 --exclude CE-HE CD-HD CE-SD --fix-outliers \
--project-methyls --summary
Table: Summary SVD Statistics for Molecule 2MJB-1
---------- --------------- ---------------- ----------------- --------------- -----------
Overall Q (%): 18.7 RMS: 2.71 count: 461
N-H Q (%): 5.4 RMS: 0.46 count: 63 Da (Hz): 9.3 Rh: 0.147
C-CA Q (%): 17.2 RMS: 0.25 count: 58 Da (Hz): -1.6 Rh: 0.147
C-H+1 Q (%): 13.8 RMS: 0.4 count: 61
C-N+1 Q (%): 19.6 RMS: 0.17 count: 60 Da (Hz): 1.0 Rh: 0.147
CA-CB Q (%): 14.4 RMS: 0.43 count: 38 Da (Hz): -1.6 Rh: 0.147
CA-HA Q (%): 10.9 RMS: 1.9 count: 66 Da (Hz): -19.3 Rh: 0.147
CB-CG Q (%): 16.1 RMS: 2.87 count: 19
CB-HB Q (%): 18.3 RMS: 3.45 count: 50
CD-CG Q (%): 31.8 RMS: 5.67 count: 19
CG-HG Q (%): 43.5 RMS: 8.29 count: 27
Alignment Aa: 0.0004318 Ar: 6.355e-05
Saupe Szz: 0.0008637 Syy: -0.0003365 Sxx: -0.0005272
Angles Z (deg): 48.1 Y' (deg): 21.8 Z'' (deg): 104.0
---------- --------------- ---------------- ----------------- --------------- -----------
* Inverting the sign of 'N-H' interactions improved the overall Q-factor from 367.2% to 83.2%.
* Inverting the sign of 'C-N+1' interactions improved the overall Q-factor from 83.2% to 21.1%.
* Removing outlier data points A.46CA-CB, A.13CB-CG2, A.16CB-HB#, A.60CB-HB#, A.24C-25N improved the
overall Q-factor from 21.1% to 18.7%.
Likewise, the crystal structure of ubiquitin (-i 1UBQ
) can be used in
the fit. In this case, the structure is missing hydrogen atoms, and these
must be added (--hydrogenate
).
$ ml -s pa -i 1UBQ -d 2MJB --set 0 --fix-outliers --project-methyls \
--hydrogenate --summary
Table: Summary SVD Statistics for Molecule 1UBQ
---------- --------------- --------------- ----------------- --------------- -----------
Overall Q (%): 39.8 RMS: 5.77 count: 474
N-H Q (%): 14.7 RMS: 1.2 count: 62 Da (Hz): 8.9 Rh: 0.181
C-CA Q (%): 26.2 RMS: 0.37 count: 58 Da (Hz): -1.5 Rh: 0.181
C-H+1 Q (%): 20.5 RMS: 0.56 count: 60
C-N+1 Q (%): 31.1 RMS: 0.26 count: 61 Da (Hz): 0.9 Rh: 0.181
CA-CB Q (%): 25.1 RMS: 0.36 count: 37 Da (Hz): -1.5 Rh: 0.181
CA-HA Q (%): 25.8 RMS: 4.32 count: 66 Da (Hz): -18.4 Rh: 0.181
CB-CG Q (%): 24.4 RMS: 4.18 count: 18
CB-HB Q (%): 63.8 RMS: 10.42 count: 51
CD-CG Q (%): 55.1 RMS: 9.42 count: 19
CD-HD Q (%): 49.6 RMS: 8.45 count: 10
CE-HE Q (%): 19.7 RMS: 3.68 count: 4
CE-SD Q (%): 36.7 RMS: - count: 1
CG-HG Q (%): 94.9 RMS: 15.63 count: 27
Alignment Aa: 0.000412 Ar: 7.466e-05
Saupe Szz: 0.0008239 Syy: -0.0003 Sxx: -0.0005239
Angles Z (deg): 330.1 Y' (deg): 70.4 Z'' (deg): 257.7
---------- --------------- --------------- ----------------- --------------- -----------
* Inverting the sign of 'N-H' interactions improved the overall Q-factor from 516.4% to 93.7%.
* Inverting the sign of 'C-N+1' interactions improved the overall Q-factor from 93.7% to 46.3%.
* Removing outlier data points A.46CA-CB, A.14CB-CG2, A.48CD-HD#, A.33CE-HE#, A.48N-H, A.7C-8H,
A.28CA-CB, A.13CB-CG2, A.14CB-HB improved the overall Q-factor from 46.3% to 39.8%.
Tensor Conventions¶
In the absence of motion, dipolar tensors are axially symmetric (i.e. \(\delta_{xx} = \delta_{yy}\)) and the principal component (\(\delta_{zz}\)) is colinear with the internuclear vector in the principal axis system (PAS).
Chemical shift tensors (CSA) may be axially asymmetric (i.e. \(\delta_{xx} \neq \delta_{yy}\)), and their geometries must be specified in relation to internal atomic coordinates. We use the convention from Cornilescu et al. [Cornilescu2000].
[Cornilescu2000] | Cornilescu, G. & Bax, A. Measurement of Proton, Nitrogen, and Carbonyl Chemical Shielding Anisotropies in a Protein Dissolved in a Dilute Liquid Crystalline Phase. J. Am. Chem. Soc. 122, 10143–10154 (2000). |
The literature reports both the chemical shielding tensor (\(\sigma\)) and the chemical shift tensor (\(\delta\)). The difference between the two is an inversion of sign (i.e. \(\sigma = - \delta\)). As a result, the ordering of components between different conventions will change. In the Haeberlen convention, the chemical shift components are ordered by their magnitudes.
The isotropic component (\(\delta_{iso}\)) has already been subtracted from the three components.
In the IUPAC convention, the components are normally ordered starting from the largest component (with sign) as the ‘11’ component. However, for chemical shielding tensors, the ‘33’ component is largest.
Mollib uses the Haeberlen convention and chemical shift tensors. The backbone H, C’ and N CSA tensors are defined as follows:
The 13C’ tensor (blue) has the largest component (\(\delta_{zz}\)) oriented orthogonal to the O-C-N plane, and it is rotated about this component by the \(\alpha_z\) angle.
The 15N tensor (red) has the largest component (\(\delta_{zz}\)) nearly colinear with the H-N bond, and it is rotated away from the bond about the yy-component (orthogonal to the H-N-C’ plane) with an angle \(\beta_y\).
The 1H tensor (green) has the largest component (\(\delta_{zz}\)) nearly colinear with the H-N bond, and it is rotated about the xx-component (orthogonal to the H-N-C’ plane) by an angle \(\gamma_x\).