SUPPLEMENT

Spectral Analysis of Sequence Variability in Basic-Helix-loop-helix (bHLH) Protein Domains: Implication of the Helix Structure

Zhi Wang and William R. Atchley
Graduate Program in Biomathematics and Bioinformatics
North Carolina State University, Raleigh, NC 27695-7614, USA

Sequence Database
Multiple Alignments of the bHLH domains of 196 protein sequences in bhlh_database.xls are used in previous literature and this study. After removing the loop region between residue 32 and residue 46, only 49 columns remain in the multiple alignments in bhlh_multiple_alignment.txt .

Justification of the removal of the loop region: Since an alpha-helix generally has a periodicity of 3.40 - 3.91 aa per turn (Kyte, 1995), we mainly focus on the short-range periodic components (i.e. the high-frequency component) in this paper. The entropy values of the loop region are pretty low (refer to the entropy profile of the bHLH domain without the loop removal). Spectral density plots of the entropy profiles of the bHLH domain with the loop removal and without the removal are compared. And it is discovered that the removal of the loop region has nearly no impact on the short-range periodic components. The removal just results in the removal of some long-range periodic components occurred because of the low entropy values of the loop region. Therefore, the results in this paper are not affected by the removal of the loop region.

Entropy and Factor Means/Variances Profiles
Data can be found in whole_data.xls and the plots are Fig.1 in the paper.

Perl Script calculating the entropy profile and
1000 bootstrap entropy samples

ActivePerl (free download) is required to be installed to support the perl script.
The perl script used in this paper is in entropy_bootstrap_pl.txt
The output entropy is stored in bhlh_entropy.txt and the 1000 bootstrap entropy samples are stored in entropy_bootstrap.txt.

Spectral Analysis

(I) Finite Fourier transformation method. The spectral analysis of the entropy and Factor means/variances profiles by Finite Fourier transformation, white noise tests and the regression analysis are conducted by the SAS program spectral_fft_tests.sas.
The spectral density plots are listed in Fig.2 in the paper. Fisher¡¯s Kappa tests and Bartlett's Kolmogorov-Smirnov (BKS) white noise results are summarized in white_noise_tests_bHLH.doc. Although white noise tests can provide some useful information, we should be cautious to equate the statistical significance to the biological significance.

(II) Burg method. The spectral analysis of the entropy and Factor means/variances profiles by the Burg method are conducted by the Matlab script Spectral_Burg.m . The spectral density plots are similar to Fig.2 in the paper and therefore they are listed in Spectral_burg_plots.pdf in the supplementary material.

Harmonic Analysis
The harmonic analysis of detecting the best period estimate and its 95% confidence interval based on the 1000 bootstrap samples is conducted by the Matlab script entropy_bootstrap_harmonic.m

REFERENCE
Atchley, W.R., Jieping, Z., Andrew, F. & Tanja, D. (2005) Solving the protein sequence ¡°metric¡± problem
Proc. Natl. Acad. Sci. USA 102, 6395-6400.
(download)