Nmrpipe Download




  1. Nmrview J Manual
  2. Nmrpipe Nmr
  3. Nmrpipe Sp
  4. Nmrpipe Install
  5. Nmrdraw

NMRPipe scripts are generally UNIX C-shell scripts, or TCL scripts which use NMRPipe's customized interpreter nmrWish.Some of NMRPipe's programs and scripts have interactive graphical interfaces, including NMRPipe's primary program for viewing spectra nmrDraw, scroll.tcl for viewing strips from one or more 3D spectra, and specView.tcl for viewing 1D and 2D spectral series. Nmrpipe manual pdf Nmrpipe manual pdf Nmrpipe manual pdf DOWNLOAD! DIRECT DOWNLOAD! Nmrpipe manual pdf NMRPipe is a collection of UNIX-based programs for multidimensional spectral. This parameter it should be manually corrected by the user and the parameters.From the manual- The nmrPipe program is the central part of a system of tools for. NMRPipe is a processing program written by Frank Delaglio (delaglio@nih.gov) available from the NMRPipe web page. Given NMRPipe data noe150.pipe you convert it to UCSF format with:% pipe2ucsf noe150.pipe noe150.ucsf This creates the file noe150.ucsf. Only the real component is taken from the NMRPipe file.

Frank DelaglioStephan Grzesiek,Guang Zhu,Geerten W. Vuister,John Pfeifer,and Ad Bax

Frank Delaglio*,Stephan Grzesiek, and Ad Bax
Laboratory of Chemical Physics, National Institute of Diabetes andDigestive and Kidney Diseases, National Institutes of Health, Bethesda MD 20892USA
Geerten W. Vuister
Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8,3584 CH Utrecht, The Netherlands Department of Biochemistry
Guang Zhu
The Hong Kong University of Science and Technology Clear Water Bay,Kowloon, Hong Kong
John Pfeifer
Division of Computer Research and Technology, National Institutes ofHealth, Bethesda, MD 20892 U.S.A.

*To whom correspondence should be addressed at theNational Institutes of Health, Laboratory of Chemical Physics,NIDDK, Building 5 B2-31, 5 Center Drive MSC 0505, Bethesda MD, 20892-0505 U.S.A.

Keywords: Multidimensional NMR, Data Processing, FourierTransformation, Linear Prediction, Maximum Entropy, UNIX

Methods

  • UNIX Commands and Filters
  • UNIX Command Line Arguments
  • UNIX Pipes
  • Spectral Processing Scheme as a UNIX Pipeline
  • Multidimensional Processing via Pipelines
  • Processing Functions and Options
  • Inverse Processing
  • New Capabilities and Data Formats
  • Parallel Processing
  • Graphical Interface

Acknowledgements
Figures
Tables
References
Appendix

Sample PostScript Output


The NMRPipe system is a UNIX software environment of processing, graphics,and analysis tools designed to meet current routine and research-orientedmultidimensional processing requirements, and to anticipate and accommodatefuture demands and development. The system is based on UNIX pipes, which allowprograms running simultaneously to exchange streams of data under user control. In an NMRPipe processing scheme, a stream of spectral data flows through apipeline of processing programs, each of which performs one component of theoverall scheme, such as Fourier transformation or linear prediction. Completemultidimensional processing schemes are constructed as simple UNIX shellscripts. The processing modules themselves maintain and exploit accurate recordsof data sizes, detection modes, and calibration information in all dimensions,so that schemes can be constructed without the need to explicitly define oranticipate data sizes or storage details of real and imaginary channels duringprocessing. The asynchronous pipeline scheme provides other substantialadvantages, including high flexibility, favorable processing speeds, choice ofboth all-in-memory and disk-bound processing, easy adaptation to different dataformats, simpler software development and maintenance, and the ability todistribute processing tasks on multi-CPU computers and computer networks.

Abbreviations: 1D, one-dimensional; 2D, two-dimensional; 3D,three-dimensional; nD, multi-dimensional; CPU, Central Processing Unit; FID,Free Induction Decay; I/O, input/output; LP, linear prediction; MEM, MaximumEntropy Method; MB, megabyte; NOE, Nuclear Overhauser Effect.

As use of multidimensional NMR has become widespread, demands onmultidimensional spectral processing software have increased. Software mustkeep pace with both NMR applications research, and with the routine use of NMRfor biomolecular structure determination. Routine use requires software toaccommodate increasing numbers of experiments, larger data sizes, morecomplicated processing schemes, and common use of 4D NMR (Pelczer and Szalma,1991; Bax and Grzesiek, 1993). Various vendor-specific modes of quadraturedetection and data storage must also be addressed. At the same time, NMRtechnique development research requires software to serve as a platform fortesting and evaluation of new experiments and acquisition methods, as well asnew spectral analysis and enhancement approaches.

The user community for multidimensional processing software is alsochanging, and many practitioners of biological NMR are not necessarily familiarwith NMR computer applications or signal processing. In addition, there aregenerally increasing expectations for software that is graphically oriented,error free, and which works harmoniously with other applications on a variety ofnetworked computers. Correspondingly, current software development approachesoften favor creation of several small, well-targeted applications coordinated bystandard graphics and command tools.

We present here the NMRPipe system, a comprehensive new multidimensional NMRdata processing system which addresses the growing needs for ease of use,efficiency, and flexibility of multidimensional spectral processing in thelaboratory network. The NMRPipe system is a UNIX pipeline-based softwareenvironment for multidimensional processing, coordinated with spectral graphicsand analysis tools. The system was implemented in the C programming language(Kernighan and Ritchie, 1988) using the program development tools of UNIX(Kernighan and Pike, 1984).

Several other multidimensional NMR data processing packages have beendeveloped over the past decade, including the popular FELIX (BIOSYM TechnologiesInc., San Diego CA), as well as AZARA (W. Boucher, unpublished results),Dreamwalker (Meadows et al., 1994), GIFA (Delsuc, 1988), NMR Toolkit (Hoch,1985), NMRZ (New Methods Research Inc., Syracuse NY), Pronto (Kjaer et al.,1994), PROSA (Güntert et al., 1992), and TRIAD (Tripos Inc., St. Louis MO). The NMRPipe system incorporates a novel approach to spectral processing whichis complementary to other methods, and provides many advantages. Spectralprocessing is performed using modules connected by UNIX pipes, which allow programs running simultaneously to exchange streams of data under usercontrol. In this approach, a stream of spectral data flows through a pipelineof processing programs, each of which performs one component of the overallscheme, such as Fourier transformation or mirror-image linear prediction.

The processing programs of the NMRPipe system work in the same way asordinary UNIX commands; this means that complete multidimensional processingschemes can be constructed as standard UNIX command scripts, which are easy tolearn and manipulate. The pipeline approach provides favorable processingspeeds, while at the same time allowing the choice of both all-in-memory anddisk-bound processing, easy adaptation of new algorithms and differing dataformats, and simpler software development and maintenance. Since processing isachieved via a series of programs running simultaneously, the NMRPipe pipelineapproach also provides a way to exploit the capabilities of multi-processorcomputers or to distribute processing tasks across a network.

In addition to the general advantages of the pipeline approach, there areother advantages arising from specific details of NMRPipe's implementation. Forexample, the components of NMRPipe are engineered to maintain and exploitaccurate records of data size, detection mode, calibration information, andprocessing parameters in all dimensions. This means that schemes can be createdand reused easily, since parameters can be specified in terms of spectral units,and there is no need to explicitly define or anticipate data sizes duringprocessing. The parameter record also allows NMRPipe modules to assemble thecorrect combination of real and imaginary data for a given dimensionautomatically; this permits dimensions to be processed and reprocessed in anyorder with schemes that are generally the same regardless of acquisition modeand vendor-specific storage details.

The NMRPipe approach relies on the UNIX operating system concepts of datastreams, filters, and pipes, so they are discussed in some detail here. Bynecessity, these concepts are becoming increasingly familiar to thebiomolecular NMR community, since modern spectrometers are commonly controlledby UNIX computers, and molecular structures are usually generated and visualizedon UNIX workstations.

UNIX Commands and Filters

UNIX has no strong distinction between commands built into the operatingsystem and programs which are part of 'external' applications such asspectral processing. This means that application programs can potentially beused like ordinary UNIX commands, and the standard UNIX facilities for combiningand manipulating them can be exploited. For example, one or more UNIX commandscan be placed into an ordinary text file, called a shell script . Sucha shell script can then be executed by its name, just as if it were also a UNIXcommand.

A UNIX filter is a general term for a command or program whichreads input, processes it in some way, and produces an output. One example of afilter is the UNIX command sort, which reads lines of text and writesthem out again sorted in alphabetical order. Another example is the UNIXcommand tr, which translates characters (e.g. from upper-case tolower-case) in its input before writing them. Depending on the nature of thetask involved, UNIX filters may read and process their input data in smallparts, such as tr (which can process one character at a time), or in itsentirety, such as sort (which must read the entire input first in orderto sort it).

In UNIX terminology, a filter's source of input data is called standardinput and its destination for output data is called standard output. By default, standard input is data entered from the keyboard, and standardoutput is data displayed on the computer screen. UNIX allows filters to taketheir input from an existing file instead of the keyboard; this is called inputredirection, and it is performed using the < character. Correspondingly,filters can send their output to a file instead of to the screen; this is calledoutput redirection, and it is performed using the > character. Thefollowing two UNIX commands show examples of redirection. The first commandwill sort the lines in file 'old.text', and write the sorted resultsto file 'new1.text'; the second command will convert the text in file 'new1.txt'from lower-case to upper-case, and store the result in file 'new2.text':

Commands like these illustrate the concept of a data stream, where data 'flows'from an input source, travels through a filter, and collects at an outputdestination.

UNIX Command Line Arguments

The use and behavior of a UNIX command can be adjusted by command-linearguments, which are additional parameters specified after the command. Theparameters are usually identified by words or letters prefixed by the -character. For instance, while the UNIX command sort will sort text inalphabetical order, adding the argument -r will cause text to be sortedin reverse alphabetical order:

sort -r < old.text > new1.text

Each UNIX command has its own list of possible command-line arguments, which are described in the command's manual page, a brief document (butoften more than one page) that is available on-line. UNIX manual pages have astandard format, and new manual pages can be added easily, so that applicationprograms can make use of the same on-line help system used by other UNIXcommands.

UNIX Pipes

UNIX pipes allow commands to be connected together in a series, where theoutput of one command is used directly as the input to the next command. Aseries of programs connected in this way is often called a pipeline. Apipe is specified in a UNIX command line by the | character inserted betweencommands. For example, we can combine the sorting and character translationcommands into a single pipeline:

sort < old.text | tr 'a-z' 'A-Z' > new2.text

In this pipeline, data travels from the input file through the sortfilter, and the sorted result travels via pipe through the tr filter andthen to the output file. As shown, pipes allow simple commands to be combinedto perform complex tasks, while avoiding the need for intermediate results to besaved in files. Pipeline communication is also relatively fast, since UNIXpipes are generally implemented via physical memory buffers in the operatingsystem (Stevens, 1992).

Pipelines, like UNIX command lines in general, can be split over severallines of text. This is especially useful when the pipeline contains manycomponents. In the UNIX idiom, the character is used at the end of a line tocontinue a command onto the next line. For example, a functionally equivalentversion of the sort pipeline above could be entered as follows:

Spectral Processing Function as a UNIX Filter

The concept of a UNIX filter command can be extended directly to spectralprocessing. By analogy, a spectral processing function can be implemented as aUNIX filter which reads an input stream of unprocessed spectral data vectors,applies a spectral processing function to each vector, and writes the result asstream of processed vectors. We have implemented this concept as a programcalled nmrPipe, the central module of the NMRPipe system.

The nmrPipe program applies a given processing function to a streamof spectral data. The processing function is selected via a 'function name'argument -fn, and corresponding processing modes and parameters arespecified by other optional command-line arguments. For example, the followingthree commands are filters which apply a forward Fourier transform (FT), aninverse Fourier transform, and a 90-degree zero order phase correction (PS),respectively:

A Forward Transform Filter:nmrPipe -fn FT

An Inverse Transform Filter:nmrPipe -fn FT -inv

A Phase Correction Filter:nmrPipe -fn PS -p0 90

The required input stream for nmrPipe consists of a headerdescribing the data, followed by the binary data vectors themselves, usually ina sequential order. The output stream consists of the header, which is updatedto reflect processing, followed by the processed data vectors. The streamformat is meant to resemble the contents of an ordinary 2D file plane, so thatsuch a file can be used directly with nmrPipe.

As with other UNIX filters, nmrPipe reads and writes streams viastandard input and standard output, but for convenience explicit input andoutput file names can be specified by the command-line arguments -in and-out. For example, the following two commands perform the same task;they both apply a Fourier transform to all the data vectors in file 'spec.fid',and save the result in file 'spec.ft':

The nmrPipe program includes implementations of many common 1Dprocessing functions, as well as several other useful elements; these are listedin Table I, and several are discussed in more detail below.

Spectral Processing Scheme as a UNIX Pipeline

The concept of a spectral processing function performed as a UNIX filterleads directly to the idea of a spectral processing scheme implemented as a UNIXpipeline; this is the central concept of the NMRPipe system. In this method,spectral data flows through a pipeline of processing filters, each performingone aspect of the processing scheme. In practice, this is achieved by usingmultiple instances of the nmrPipe program, each with differentcommand-line arguments to select a processing function and optional parameters. For example, the following scheme applies a sinusoid-to-a-power window function(SP), zero fill (ZF), Fourier transform (FT), and deletes the imaginary part ofthe result (-di). In the absence of additional arguments, theprocessing functions in this scheme use default parameters, so that the SPfunction applies a sine-bell, the ZF function doubles the data size, and the FTfunction applies a complex forward transform:

Considered in more detail, the scheme above consists of three instances ofnmrPipe, connected by pipes, and running 'simultaneously'. This means that the UNIX operating system will alternate CPU time and otherresources between the instances of nmrPipe while the scheme isexecuting. During execution, the first instance of nmrPipe reads a data vectorfrom the input file 'spec.fid', applies the window function SP, andwrites the result vector to the pipeline. The second instance of nmrPipereads the apodized vector from the pipeline when it becomes available, applieszero filling, and writes the result to the next stage of the pipeline. Thethird instance of nmrPipe reads the apodized, zero-filled vector fromthe pipeline when it becomes available, applies a Fourier transform, and writesthe result to file 'spec.ft'; meanwhile, the earlier instances of nmrPipemay have already begun to read and process the next vector. This procedurecontinues until all vectors have passed through the pipeline.

Spectrometer Format Conversion

Many of the advantages of the NMRPipe system stem from the fact thatrelevant acquisition parameters for all dimensions are established duringconversion of data from the spectrometer format to the NMRPipe format. Atypical 3D conversion script is given in Figure 1.As shown, the conversionestablishes the acquisition modes, data sizes and chemical shift calibrationinformation for each dimension. The parameters are usually entered manually,but most of these could be extracted automatically from spectrometer parameterfiles (D. Benjamin, private communication).

The conversion programs themselves have been engineered to compensate forvendor-specific differences in the way that real and imaginary data areinterleaved for each dimension, so that the converted result always provides thereal and imaginary data for all dimensions in a predictable order. This allowssubsequent processing schemes to be independent of spectrometer vendor. Currently, the NMRPipe system includes conversion facilities for GE Omega exportformat, JEOL GX and Alpha formats, Chemagnetics format, Varian Unity format, andBruker AM, AMX, and DMX formats.

Like nmrPipe, the conversion programs are also implemented as UNIXfilters. This means that the output stream of a conversion command can be sentdirectly into a processing pipeline, without the need to save an intermediateconverted result on disk. It also means that a conversion program can readdata produced by another pipeline command as an alternative to reading datadirectly from a file. One useful example of this is the ability to convertdata directly from a tape drive by using a tape reading command (such as theUNIX command dd) as the data source. Another example is the ability toconvert versions of spectrometer data which were compressed to save space byusing a decompression command (such as the UNIX command zcat) as thedata source.

Multidimensional Processing via Pipelines

The NMRPipe system includes two approaches to extend the pipeline method tomultiple dimensions. One approach is to insert an appropriate matrix transposecommand into the interior of a processing pipeline. Another approach is to usecommands at the beginning or end of the pipeline which are capable of reading orwriting vectors from an arbitrary dimension of a multidimensional spectrum. Thetwo approaches can be used alone or in combination.

In a pipeline, a transpose function acts like a reservoir, which accumulatesan intermediate result in memory before sending the transposed version down theremainder of the pipeline. Therefore, functions before a transpose receive andprocess a stream of vectors from a given dimension, and then functions after thetranspose receive and process a stream of vectors from the exchanged dimension. Depending on which dimensions are being exchanged, a transpose function mayrequire only enough memory for a 2D plane from the data, or it may requireenough memory for an entire 3D or 4D matrix, so it is not generally applicable.

As noted above, the pipeline approach can be extended to multidimensionalprocessing simply by adding two kinds of modules, as an alternative to in-memorytranspose. The first module is a program at the head of the pipeline, whichcreates a data stream by reading vectors from a given dimension of amultidimensional input. The second module is a program at the tail of thepipeline, which gathers processed vectors and writes them to a given dimensionof a multidimensional output. We have implemented two such programs, xyz2pipeand pipe2xyz, which are suitable for reading and writingmultidimensional data in the multi-file 2D plane format suggested by Kay et al.(Kay et al., 1989). The programs take their names from the nomenclature X-axis,Y-axis, Z-axis, A-axis, etc. which we use to describe the dimensions of thespectral data. Correspondingly, the dimension to be read or written isspecified simply as a command-line argument -x, -y, -z,or -a. When reading or writing from a given dimension, the programsalter the sequential order of the other dimensions in the data stream in aregular, predictable way, by a multidimensional rotation. This means thatschemes can be created to conserve the original data order, or change it toaccommodate a particular processing or analysis strategy. The programs requireat most enough physical memory to contain only four or so 2D planes from thedata. In addition, the programs have been engineered to allow in-placeprocessing (i.e. same input and output files), and to provide the correctcombinations of real and imaginary data so that dimensions can be processed inany order.

In the simplest multidimensional scheme, each dimension of the data isprocessed in a separate pass, which requires reading the entire input from disk,and writing the entire result. Such a scheme can be simplified and made moreefficient by adding one or more in-memory transpose steps, which eliminates theneed to save an intermediate result on disk. A typical 3D processing scriptemploying a 2D transpose approach is shown in Figure 2.In this script, theX-axis and Y-axis are processed together in the first pass, then the Z-axis isprocessed in a second pass. Such a script represents an effective compromisebetween disk access and physical memory use, since in practice only a smallnumber of 2D planes are being manipulated in memory at any given time by thevarious programs in the pipeline. If large amounts of physical memory areavailable, schemes with 3D or 4D in-memory transpose steps can also beconstructed, again reducing the need to save intermediate results. The overallapproach provides basic multidimensional schemes which require only modestamounts of memory for 3D or 4D processing, but which can be altered easily totake advantage of large memory systems. Complementary examples in the case of4D processing are given in Figures 3and 4

The script shown in Figure 3 converts and processes a 4D spectrum in threepasses, using only 2D in-memory transpose. In this case, the spectrometerformat conversion, X-axis processing, and Y-axis processing are all performedin the first pass, the Z-axis is processed in the second pass, and the A-axis isprocessed in the third pass. The corresponding script in Figure 4performs thesame processing, but it has been rearranged so that the spectrum is processed inonly two passes by the addition of a 3D in-memory transpose function. The firstpass performs the spectrometer format conversion and the processing for theX-axis, Y-axis and Z-axis. The A-axis is processed in the second pass. Asthese examples show, in-memory processing is achieved at the discretion of theuser, simply by use of appropriate transpose functions. Only minor alterationof a given processing scheme is needed, and no reconfiguration or recompilationof the software is required. Instead, the transpose functions, like all otherfunctions of the NMRPipe system, allocate suitable amounts of memoryautomatically.

Processing Functions and Options

The NMRPipe system makes use of a relatively small number of processingfunctions, but these are augmented by a variety of modes and options; theprocessing functions listed in Table 1 and the Appendix include over 300 optionsand parameters. For example, the functions POLY (polynomial fitting) and LP(linear prediction) each have a rich collection of parameters which allows themto perform many tasks. The POLY function can be used as a solvent filter in thetime-domain, as well as for manual or automated correction according to areliable in-house algorithm, and the corrections can be limited to selectedspectral regions if desired. The linear prediction function LP can be used topredict points in either the start, end, or interior of existing data, inbackward, forward or mixed forward-backward mode, with or without mirror-imagemethods and root-reflection. In addition to this flexibility, the LP functionhas also been implemented using a matrix inversion procedure in place of theiterative (and often unstable) root-searching approach, making it especiallyrobust (G. Zhu and A. Bax, unpublished results).

The NMRPipe processing functions make extensive use of default parametersettings. This helps to make argument lists more concise, since individualparameters can be adjusted while leaving default settings intact. For example,when used with no other arguments, LP will apply linear prediction and rootreflection with 8 complex coefficients to extend the original data to twice itssize. The number of coefficients (the LP order) can be changed via the -ordoption, and the number of predicted points can be changed independently via the -pred parameter. Mirror image LP can be selected simply by addingeither flag -ps0-0 or -ps90-180 to any LP command line,depending on whether data have no acquisition delay, or a half-dwell delay.

Many of the functions exploit or update the spectral header parametersduring processing. For example, apodization, zero-fill, and phase correctiondetails are recorded, and chemical shift calibrations can be updatedautomatically by any function which extracts or shifts the data. The functionsalso keep track of the valid time-domain size of the data, as influenced bytime-domain shifts or frequency-domain extractions. Where appropriate,parameters can be specified in PPM or Hz as well as in points.

Inverse Processing

Multidimensional enhancement schemes commonly call for inverse processing,so several functions have been implemented with an inverse mode forconvenience. For instance, window functions support an inverse mode whichdivides by the window function, and zero filling supports an inverse mode whichstrips away previous zero padding. These conveniences make it possible toconstruct complicated inverse processing protocols concisely, and if parametersare selected appropriately the original data can commonly be recovered to aprecision of better than one part in 105. Examples are given in Figure 5 and 6, which show forward/inverse processing scripts for applying linear predictionand Maximum Entropy reconstruction in the two indirectly-detected dimensions ofa 3D spectrum. In the case of the LP scheme in Figure 5,forward and inverseprocessing is used to minimize the number of signals which must be predicted inany given vector in order to increase the prediction's stability andincidentally decrease the time required (Kay et al., 1991). In the case of theMEM scheme in Figure 6,forward and inverse processing is used to allow a morestable automated baseline correction by using data processed with windowfunctions, before data is reprocessed without window functions for MaximumEntropy reconstruction.

New Capabilities and Data Formats

One of the special advantages to the pipeline approach is the ease andflexibility with which new capabilities and data formats can be implemented. The primary data format of the NMRPipe system consists of one or more 2D fileplanes, each with a 2048-byte header followed by four-byte floating pointspectral data values in a sequential order. Other multidimensional data formatscan be adapted simply by use of alternative programs to read or write data atthe head or tail of a pipeline; the submatrix formats of the powerful spectralanalysis programs NMRView (Johnson and Blevins, 1994) and ANSIG (Kraulis, 1989;Kraulis et al., 1994) have been accommodated by their authors in this way. Tofacilitate work of this kind, the standard NMRPipe installation includes Csource code for the spectrometer format conversion programs, file headerinterpretation and general I/O utilities, and the multidimensional I/O programsxyz2pipe and pipe2xyz.

New processing functions can be implemented as simple UNIX filter programswhich can be inserted directly in the pipeline data stream, without the need toalter the nmrPipe program itself. As an alternative to writing acomplete program, nmrPipe includes the MAC function, a macro interpreter whichimplements a sub-set of the C programming language, augmented with a variety ofvector processing commands. The interpreter was implemented primarily fordevelopment purposes, using the UNIX compiler generator Yacc (Johnson, 1986). The macro language allows direct manipulation of the data points, and thepossibility to control the details of file I/O during processing. In itsdefault mode, the MAC function will apply the contents of a user-written macroto every 1D vector in the given dimension, so that new functions can beimplemented simply by placing a list of vector functions or other processingsteps in a text file. This provides a convenient way to prototype newprocessing applications. For example, special processing steps for driftcorrection, gradient-enhanced data (Cavanagh et al., 1991; Palmer et al., 1991;Kay et al., 1992) and Bruker DMX digitally oversampled data have been developedthis way.

Parallel Processing

Many possible approaches can be envisioned for performing a multidimensionalprocessing task in parallel over a network of computers or on a multi-CPUmachine. By modifying only the multidimensional I/O programs (xyz2pipeand pipe2xyz), we have implemented one simple but broadly applicableapproach, which relies only on standard UNIX network file sharing, and avoidsthe need for special machine-specific parallel compiling or configuration ofsoftware. This particular implementation uses static load balancing, whichmeans that the amount of data to be processed by each computer is fixed at theoutset of a task, and therefore there is no compensation for possible changes inCPU performance during the course of a calculation. In practice, the userperforms parallel processing by creating a single script which processes acomplementary subset of a complete spectrum depending on which computer is usedto execute it; the same script is then executed simultaneously on all CPUsinvolved. The division of data is performed automatically according to auser-supplied list of computers and their approximate relative speeds, so thatonly minor modification of an ordinary scheme is needed to convert it to aparallel scheme.

Graphical Interface

Nmrpipe nmr

As noted by Güntert et al. (1992), it is a difficult task to create andmaintain a single, integrated spectral graphics and processing program. Nevertheless, in our experience we have found it essential to be able tographically inspect the FID data, to interactively choose processing parameters,and to examine intermediate processing results on the workstation screen or inhard copy. In an attempt to meet these needs, we have developed a supplementalgraphics interface called NMRDraw, using the X11 network graphics library andthe XView graphical interface toolkit (Heller and Van Raalte, 1993). Theprogram, shown in Figure 7,currently runs on Sun, SGI, and IBM RS6000 UNIXworkstations.

The NMRDraw program provides facilities for inspecting raw and processeddata via 1D and 2D slices or projections from all dimensions, as well as a macroeditor for creating and executing complete multidimensional processing scripts. NMRDraw also allows real-time display and interactive phasing of an arbitrarynumber of 1D slices selected from any dimension of the spectrum and displayedsimultaneously. Interactive 1D processing is performed via program-controlledpipelines to nmrPipe, providing the functionality of both graphics andprocessing without the need to incorporate the two in a single program. Inkeeping with the philosophy of well-separated applications, the data extractionand display facilities of NMRDraw can also be operated remotely by two-waypipelines to other programs, in order to construct graphical spectral analysisschemes. A prototype example of this approach, modeled after the NMRViewspectral analysis package (Johnson and Blevins, 1994), is shown in Figure 8.

Independently of our graphics interface development, spectroscopists at atest site for the NMRPipe system have used the TCL graphics command language tocreate interactive nmrPipe schemes (N. Tjandra, private communication). TCL provides a method to build graphics applications using shell-scripts alone,without the need to write, compile, and link a complete program (Ousterhaut,1994). Since TCL provides an easy method for building graphical applications atthe UNIX shell-script level, it is ideal for use with NMRPipe schemes, whichalso operate at the shell-script level. Using this approach, it was possible tocreate a graphical interface which provides routine format conversion andprocessing without the requirement for users to edit shell-scripts directly.

Companion Software

In addition to the processing and display facilities described above, theNMRPipe system includes several other applications, such as algebraiccombination of spectra, simulation of time-domain or frequency-domain data frompeak tables, multidimensional Non-Linear Least-Squares modeling of spectral lineshapes, general-purpose functional fitting with Monte Carlo error estimation,and Principal Component Analysis. Stand-alone functions for examining andadjusting spectral header parameters are also included. Processed data from theNMRPipe system can be used directly with the PIPP/CAPP system forcomputer-assisted spectral analysis (Garrett et al., 1991); together, thesesoftware systems have been used to help generate roughly 10-15% of the NMRstructures deposited in the Brookhaven Protein Databank since the beginning of1994.

The NMRPipe system has been tested in over 50 laboratories, and has proveneasy to use, robust, and thorough in its capabilities. In our directexperience, it is also more efficient than previous approaches we have tried,and it has successfully been adapted to new data formats and acquisition modes. Because of its design principles, it has been easy to port and maintain thissystem on several different computer platforms, and to coordinate it with avariety of graphics and analysis systems.

Processing times on various computers for a typical 3D application are givenin Table 2,and times for some other applications are given in Figures 3, 4, 5,and 6.The main source of performance overhead in these examples is due to themulti-plane data format and to pipeline communication. We decided to use themulti-plane format in order to accommodate preexisting software which also usedthis format. While this format has the advantage of simplicity, it is notnecessarily the best choice in all respects, especially for 4D data, since thenumber of file planes can become very large and relatively inefficient tomanipulate. But, since the source and destination formats are independent ofthe processing pipeline itself, other formats could easily be implemented, forinstance by substituting the programs which read and write multi-plane formatdata by programs which read and write submatrix format data. In this respect,the processing pipeline can be thought of as a format-independent processingengine.

The overhead due to data format, while measurable, is not important in manycases. For example, consider the processing times for two versions of 4Dprocessing given in Figures 3,and 4. The version in Figure 4 is 25 minutesfaster than the version in Figure 3 because it avoids one intermediateread/write of the 4D data. But, this improvement amounts to only a 5% decreasein the overall processing time. This also suggests that an all-in-memoryapproach such as the one employed by PROSA (Güntert et al., 1992) is notalways an advantage, since the performance gain will often be small, but thephysical memory requirements (> 1024 megabytes in this case) may constitute aserious obstacle. As noted by Levy et al. (1986), use of virtual memory doesnot provide an effective solution to this problem, although in years to come,computers with multi-gigabyte physical memory capacity may become commonplace.

Overhead due to pipeline communication and management is an intrinsic aspectof the NMRPipe system. This overhead is examined in Figure 9. As shown, theoverhead time increases roughly linearly with the number of programs in thepipeline. For the Sun Sparc 10 workstation, this overhead contributes about 2min. to a typical 3D processing scheme. This amounts to about 15% of the timeused for ordinary Fourier processing, and an insubstantial percentage for linear prediction applications.

A distinct performance advantage to the NMRPipe system is the ease withwhich processing tasks can be distributed over more than one CPU or workstation. The processing scripts themselves are naturally parallel, since they consist ofseveral programs running simultaneously. So, as shown in Table 2, an ordinaryNMRPipe scheme can show speed improvements on a multi-CPU computer without theneed for special machine-specific compiling or vectorization, since the variousprograms in the script will be distributed at the discretion of the operatingsystem. In the case shown for the four-CPU SGI Challenge, this simple approachyielded a 70% parallel efficiency compared to the same scheme executed on oneCPU. In addition, the facilities of the NMRPipe system allow a processing taskto be explicitly distributed by the user, an approach which yields even betterperformance, and still avoids the need for machine-specific optimization. Anexample is given in Table 3, which shows the results of a network-distributedprocessing application, with an efficiency of over 90% on five SGI workstations.

The NMRPipe implementation of multidimensional spectral processing via UNIXpipes provides a solution which is comprehensive, easy to use, flexible,extensible, and efficient. It naturally accommodates parallel processingapproaches, and encourages and supports use of well-separated applications forgraphics and analysis. Since the NMRPipe approach is complementary to existingmethods which rely on monolithic programs, its unique combination of advantagesis likely to prove increasingly useful as biomolecular NMR continues to advance.

In the course of the past two years, many people have assisted in thedevelopment, evaluation, and refinement of the software system presented; forthis invaluable assistance, the authors wish to thank M. Akutsu, S. Archer, D.Benjamin, R.A. Byrd, R.M. Clore, M. Donlan, N. Farrow, J. Forman-Kay, S. Gagne,D. Garrett, H. Grahn, A.M. Gronenborn, T. Harvey, H. Hatanaka, E. Henry, M.Ikura, Y. Ito, L.E. Kay, W. Klaus, J. Kordel, R. Martino, L. Nicholson, I.Pelczer, R. Powers, M. Shirakawa, S. Tate, N. Tjandra, H. Tsuda, T. Yamazaki,and T. Yamazaki. Thanks is also extended to A. Wang for critical reading of themanuscript. This work was supported in part by the AIDS Targeted Anti-ViralProgram of the Office of the Director of the National Institutes of Health.

For details on retrieving the software, send a request via electronic mail addressed to Frank Delaglio[Note: as of Spring 2007, Frank is no longer at the NIH;contact Frank at delaglio@nmrscience.com]

Barkhuijsen, H., De Beer, R., Bovée, W.M.M.J and Van Ormondt, D.(1985) J. Magn. Reson., 61, 465-481.

Barkhuijsen, H., De Beer, R. and Van Ormondt, D. (1987) J. Magn. Reson.,73, 553-557.

Bax, A. and Grzesiek, S. (1993) Acc. Chem. Res., 26,131-138.

Callaghan, P.T., MacKay, A.L., Pauls, K.P., Soderman, O., and Bloom, M.(1984) J. Magn. Reson.56, 101-109.

Cavanagh, J., Palmer, A.G., Wright, P.E. and Rance, M. (1991) J. Magn.Reson., 91, 429-436.

Delsuc, M.A. (1989) Maximum Entropy and Bayesian Methods, KluwerAcademic, Amsterdam.

Delsuc, M.A., Ni, F. and Levy, G.C. (1987) J. Magn. Reson., 73,548-552.

Friedrichs, M.S. (1995) J. Biomol. NMR., 5, 147-153.

Garrett, D.S., Powers, R., Gronenborn, A.M. and Clore G.M. (1991) J.Magn. Reson., 94, 214-220.

Gull, S.F. and Daniell, G.J. (1978) Nature, 272, 686-690.

Güntert, P., Doetsch, V., Wider, G. and Wüthrich, K. (1992) J.Biomol. NMR, 2, 619-629.

Heller, D. and Van Raalte T. (1993) XView Programming Manual,O'Reilly & Assoc., Inc.

Hoch, J.C. (1989) Methods Enzymol., 176, 216-241.

Hoch, J.C. (1985) Rowland Institute for Science Technical MemorandumRIS-18t.

Hoch, J.C., Stern, A.S., Donoho, D.L. and Johnstone, I.M (1990) J. Magn.Reson., 86, 236-246.

Hore, P.J. (1985) J. Magn. Reson., 62, 561-567.

Johnson, B. and Blevins, R.A. (1994) J. Biomol. NMR., 4,603-614.

Johnson, S. (1986) in UNIX Programmer's Manual: Supplementary Documents1, University of California, Berkeley.

Nmrview J Manual

Kauppinen, J. and Saario, E. K. (1993) Appl. Spectrosc., 47,1123-1127.

Kay, L.E., Ikura, M., Zhu, G. and Bax, A. (1991) J. Magn. Reson.,91, 422-428.

Kay, L.E, Keifer, P. and Saarinen, T. (1992) J. Am. Chem. Soc., 114,10663-10666.

Kay, L.E., Marion, D. and Bax, A (1989) J. Magn. Reson., 84,72-84.

Kernighan, B.W. and Pike, R. (1984) The UNIX Programming Environment, Prentice-Hall, Englewood Cliffs NJ.

Kernighan, B.W. and Ritchie, D.M. (1988) The C Programming Language,Prentice-Hall, Englewood Cliffs NJ.

Kjaer, M., Ansersen, K.V. and Poulsen, F.M. (1994) Methods Enymol.,239, 288-307.

Nmrpipe Nmr

Kraulis, P.J., (1989) J. Magn. Reson., 84, 627-633.

Kraulis, P.J., Domaille, P.J., Campbell-Burk S.L., Van Aken, T. and Laue,E.D. (1994) Biochemistry, 33, 3515-3531.

Kumaresan, R. and Tufts, D.W. (1982) IEEE Trans., ASSP-30,833-840.

Laue, E.D., Mayger, M.R., Skilling, J. and Staunton, J. (1986) J. Magn.Reson., 68, 14-29.

Laue, E.D., Skilling, J. and Staunton, J. (1985) J. Magn. Reson.,63, 418-424.

Laue, E.D., Skilling, J., Staunton, J., Sibisi, S. and Brereton, R. (1985)J. Magn Reson., 62, 437-452.

Levy, G.C., Delaglio, F., Macur, A. and Begemann, J. (1986) Comput.Enhanced Spectrosc., 3, 1-12.

Marion, D., Ikura, M. and Bax, A. (1989) J. Magn. Reson., 84,425-430.

Marion, D., Ikura, M., Tschudin R and Bax, A. (1989) J. Magn. Reson.,85, 393-399.

Marion, D. and Wüthrich, K. (1983) Biochem. Biophys. Res. Commun.,113, 967-974.

Mazzeo, A.R., Delsuc, M.A., Kumar, A. and Levy, G.C., (1989) J. Magn.Reson., 81, 512-519.

Meadows, R.P., Olejniczak, E.T. and Fesik, S.W. (1994) J. Biomol. NMR,4, 79-96.

Ni, F., and Scheraga, H.A. (1986) J. Magn. Reson.,70, 506-511.

Ni, F., Levy G.C. and Scheraga, H.A. (1986) J. Magn. Reson.,66, 385-390.

Olejniczak, E.T. and Eaton, H.L. (1990) J. Magn. Reson., 87,628-632.

Ousterhout, J.K. (1994) Tcl and the Tk Toolkit, Addison-Wesley,Reading MA.

Palmer, A.G., Cavanagh, J., Wright, P.E. and Rance, M. (1991) J. Magn.Reson., 93, 151-170.

Parks, S.I. and Johannesen, R.B. (1976) J. Magn. Reson., 22,265-267.

Pelczer, I. and Szalma, S. (1991) Chemical Reviews, 91,1507-1524.

Redfield, A.G. and Kunz, S.D., (1975) J. Magn. Reson., 19,250-254.

Schmieder, P., Stern, A.S., Wagner, G., and Hoch, J.C. (1994) J. Biomol.NMR, 4, 483-490.

Sibisi, S. (1983) Nature, 301, 134-136.

Skilling, J. and Bryan, R.K. (1984) Mon. Not. R. Astr. Soc., 211,111-124.

States, D.J., Haberkorn, R.A. and Ruben, D.J. (1982) J. Magn. Reson.,48, 286-292.

Stephenson, M. (1988) Prog. NMR. Spectrosc., 20, 515-626.

Stevens, W.R. (1992) Advanced Programming in the UNIX Environment,Addison-Wesley Pub. Co., Reading MA, 428-434.

Wu, N.L. (1984) Astron. Astrophys., 139, 555-557.

Zhu, G. and Bax, A. (1990) J. Magn. Reson., 90, 405-410.

Zhu, G. and Bax, A. (1992) J. Magn. Reson., 98, 192-199.

Zhu, G. and Bax, A. (1992) J. Magn. Reson., 100, 202-207.

APPENDIX 1 - Generic Arguments

Generic Arguments: the following is a list of arguments used by morethan one program or function in the examples and Figures.

-di deletes imaginary data from the current dimension after thegiven processing function is performed.

-hdr extracts parameters recorded during previous processing fromspectral header rather than the command line.

-in specifies the input file or file template (see 'Input andOutput Templates' below).

-inPlace permits in-place processing, which is replacement of theinput data by the output result.

-inv activates the inverse mode of a given function; function PSwill apply inverse (negative) phase correction; function FT will perform aninverse Fourier transform; function ZF will undo any previous zero-filling;function SP will apply the inverse window function and first point scaling.

-out specifies the output file or file template (see 'Input andOutput Templates' below).

-ov permits over-writing of any preexisting files.

-sw updates the sweep width and other PPM calibration information toaccommodate an extraction or shift function.

-verb performs processing in verbose mode, with status messages.

APPENDIX 2 - Selected Processing Functions

Processing Functions: the following is an alphabetical list of thenmrPipe processing functions used in the examples and figures. Thefunctions and arguments described are not complete lists, but rather only thosewhich are used in the examples.

EXT extracts a region from the current dimension with limitsspecified by the arguments -x1 and -xn; the limits can belabeled in points, percent, Hz, or PPM. Alternatively, the left or righthalf of the data can be extracted with the arguments -left and -right.

FT applies a real or complex forward or inverse Fourier transform,with sign alternation or complex conjugation, as indicated by spectralparameters or command-line arguments.

HT performs a Hilbert transform to reconstruct imaginary data,choosing between ordinary and mirror-image mode if the argument -auto isused.

LP extends the data to twice its original size by default, using acomplex prediction polynomial whose order is specified by argument -ord. Mixed forward-backward LP is performed if the -fb argument is used. Mirror-image LP for data with no acquisition delay is performed if the argument-ps0-0 is used; mirror-image LP for data with a half-dwell acquisitiondelay is performed if the argument -ps90-180 is used.

MEM applies Maximum Entropy reconstruction according to the methodof Gull and Daniell: argument -ndim specifies the number ofdimensions to reconstruct, argument -neg activates two-channel mode, forreconstruction of data with both positive and negative signals, argument -zerocorrects the zero-order offset introduced during reconstruction, argument -alphaspecifies the fraction of a given iterate which will be added to the current MEM spectrum, argument -sigma specifies the estimated standarddeviation of the noise in the time-domain, argument -freq produces thefinal MEM result in the frequency-domain, arguments -xconv and -yconvspecify the line-sharpening function, which in Figure 6 is EM (ExponentialMultiply) for both dimensions, and arguments -xcQ1 and -ycQ1specify the corresponding line-sharpening parameters, which in Figure 6 are 20Hz and 15 Hz for the 15N and 1H dimensions respectively. Other arguments can beused to optimize convergence speed, or to increase stability for reconstructionof data with high dynamic range.

POLY (frequency-domain) applies polynomial baseline correction ofthe order specified by argument -ord, via an automated baselinedetection method when used with argument -auto. The default is a fourthorder polynomial. The automated baseline mode works as follows: a copy of agiven vector is divided into a series of adjacent sections, typically 8 pointswide. The average value of each section is subtracted from all points in thatsection, to generate a 'centered' vector. The intensities of theentire centered vector are sorted, and the standard deviation of the noise isestimated under the assumption that a given fraction (typically about 30%) ofthe smallest intensities belong to the baseline, and that the noise is normallydistributed. This noise estimate is multiplied by a constant, typically about1.5, to yield a classification threshold. Then, each section in the centeredvector is classified as baseline only if none of the points in that sectionexceeds the threshold. These classifications are used to correct the originalvector.

POLY (time-domain) when used with the argument -time, fitsall data points to a polynomial, which is then subtracted from the originaldata. It is intended to fit and subtract low-frequency solvent signal in theFID, a procedure which often causes less distortion than time-domain convolutionmethods. By default, a fourth order polynomial is used. For speed, successiveaverages of regions are usually fit, rather than fitting all of the data points.

PS applies the zero and first order phase corrections as specifiedin degrees by the arguments -p0 and -p1. The PS functionapplies no processing if these values are both zero; for this reason, azero,zero phase correction step is commonly kept in a processing scheme forcompleteness, so that the scheme can be copied and reused more easily.

RS, when used in the time-domain, applies a right-shift by thenumber of points specified by argument -rs, and updates therecorded time-domain size if the argument -sw is used.

SOL uses time-domain convolution and polynomial extrapolation tosuppress solvent signal with a default moving average window of +/- 16 points.

SP applies a sine-bell window extending from sinr(ap) to sinr(bp) with offset a, endpoint b, and exponent r specified by arguments -off, -end, and -pow, first-point scaling specified by argument-c. The default length is taken from the recorded time-domain size ofthe current dimension. By default, a = 0.0, b = 1.0, r = 1.0 (sine bell), andthe first point scale factor is 1.0 (no scaling).

TP exchanges vectors from the X-axis and Y-axis of the data stream,so that the resultant data stream consists of vectors from the Y-axis of theoriginal data. It is identical to YTP.

YTP is another name for the TP transpose function, which exchangesvectors from the X-axis and the Y-axis of the data stream. The alternative nameis provided for contrast with the other transpose functions ZTP(X-axis/Z-axis Transpose) and ATP (X-axis/A-axis Transpose).

ZF pads the data with zeros; the amount of padding can be specifiedby argument -zf, which defines the number of times to double the datasize, or by the argument -size, which specifies the desired complex sizeafter zero filling. By default, the data size is doubled by zero filling. Useof the argument -auto will cause the zero-fill size to be rounded up tothe nearest power of two.

Nmrpipe Sp

ZTP exchanges vectors from the X-axis and Z-axis of the data stream,so that the resultant data stream consists of vectors from the Z-axis of theoriginal data.

APPENDIX 3 - Data Input/Output Programs and Arguments

Input and Output Templates: the following describes the method usedto specify input and output data in the multi-file 2D plane formatas well as those programs used along with nmrPipe in the examplesand figures. The arguments described are not complete lists, but rather only those which are used in the examples.

3D File Name Templates: 3D data in the multi-file 2D plane formatis specified as a template, a single name which stands for a series of 2D fileplanes. The template includes a format specification, usually '%03d',which is substituted by the Z-axis plane number in the actual file names. Theformat specification is interpreted by rules of the C programming language; the'03d' in the template means the plane number will be included as azero-padded three-digit number, to give a series of names such asfid/noe001.fid, fid/noe002.fid, fid/noe003.fid, etc.

4D File Name Templates: 4D data in the multi-file 2D plane format isspecified as a template, a single name which stands for a series of 2D fileplanes. The template includes a format specification, usually '%02d%03d',which is substituted by the A-axis and Z-axis plane numbers in the actual filenames. The format specification is interpreted by rules of the C programminglanguage; the '02d' and '03d' in the template means theA-axis plane number will be included as a zero-padded two-digit number, followedby the Z-axis plane number as a zero-padded three-digit number.

bruk2pipe converts binary data from various types of Brukerspectrometers to the nmrPipe data format. The related programs var2pipeand bin2pipe perform Varian Unity conversions, and general-purposebinary conversions, respectively. The programs take as input a file or datastream in the binary spectrometer format, and produce a file, file series, ordata stream in the NMRPipe format. The programs require a collection ofarguments defining the acquisition parameters for each dimension, prefixed by-x, -y, -z, and -a; the commonly requiredarguments follow: arguments -xN etc. define the total number of pointssaved in the input file for a given dimension; arguments -xT etc.define the number of valid complex points actually acquired, in case thisdiffers from the number of points saved in the input file; arguments -xMODE etc. define the quadrature detection mode of the givendimension; arguments -xSW etc. define the full spectral with in Hz forthe given dimension; arguments -xOBS etc. define the observe frequencyin MHz for a given dimension, while arguments -xCAR etc. define thecarrier position in PPM; arguments -xLAB etc. define unique axislabels; argument -ndim defines the number of dimensions in the input;argument -aq2D defines the type of 2D output file planes produced aseither magnitude mode, States/States-TPPI, or TPPI.

xyz2pipe creates a data stream for multidimensional processing viapipeline by reading vectors from the selected axis of nD data in the multi-planeformat. The arguments -x, -y, -z, and -aselect the axis, and the argument -in is used to specify the input fileseries as a template (see 'Input and Output Templates' above).Depending on the dimension selected, the other dimensions are reordered by amultidimensional rotation, which is similar, but not always identical, to atranspose. If the original order of dimensions is described as XYZA..., therelative reordering of data can be summarized as follows:

nmrPipe -fn TP Exchange of the first two dimensions: XYZA... toYXZA...

Nmrpipe Install

nmrPipe -fn ZTP Exchange of the first and third dimensions:XYZA... to ZYXA...

nmrPipe -fn ATP Exchange of the first and fourth dimensions:XYZA... to AYZX...

xyz2pipe -xNo change to data order: XYZA... to XYZA...

xyz2pipe -yRotation of the first two dimensions (same as TP):XYZA... to YXZA...

xyz2pipe -zRotation of the first three dimensions: XYZA... toZXYA...

xyz2pipe -aRotation of the first four dimensions: XYZA... toAXYZ...

pipe2xyz writes vectors from a data stream to the selected axis ofND data in the multi-plane format. The arguments -x, -y, -z, and -a select the axis, and the argument -out isused to specify the output file series as a template (see 'Input and OutputTemplates' above). In order to write to a given axis, the program pipe2xyzperforms rotations of the data which are complementary to those performed byxyz2pipe. This means that a pipeline which begins with xyz2pipereading from a given dimension and ends with pipe2xyz writing to thesame dimension will conserve the original data order if no transpose steps areincluded in between.

Figure 1: Annotated format conversion script used for a 3DCBCA(CO)NH FID acquired on a Bruker AMX spectrometer. The general form of theconversion script is the same for other spectrometers. Parameters for eachdimension are specified via arguments prefixed by -x, -y, -zand -a for the X-axis, Y-axis, Z-axis, and A-axis of the data. In orderto accommodate padding which may have been performed by the spectrometer, thereare separate parameters for the number of points stored in the input file, andthe number of points actually acquired. The acquisition modes are specified bykeywords such as 'Sequential' (Redfield and Kunz, 1975), 'Complex'or 'States' (States et al., 1982), 'TPPI' (Marion and Wüthrich,1983), 'States-TPPI' (Marion et al., 1989), etc., which define theFourier transform mode and sign manipulation required; chemical shiftcalibration parameters are also recorded. The NMRPipe format output series isspecified by the argument -out. Complete argument details are given inthe Appendix.

Figure 2: Annotated processing script for 3D amide-proton detecteddata, illustrating use of 2D transpose. In this scheme, the X-axis and Y-axisare read, processed, and written in the first pass, and the Z-axis is read,processed and written in the second pass. Each pass consists of a pipelinebeginning with the xyz2pipe program and ending with the pipe2xyzprogram; these programs use the arguments -x, -y, -z,and -a to specify which dimension is being read or written. The inputand output file series are specified by the template arguments -in and-out. Complete argument details are given in the Appendix.

Figure 3. Annotated 4D format conversion and processing script for a (256*)(64*)(16*)(16*) point 4D 13C-13C correlated 1H-1H NOE FID,illustrating use of 2D transpose (the asterisk denotes complex data). Acquisition parameters have been abbreviated by $ARGS and phase correction stepshave been omitted to save space. In this scheme, the results of the formatconversion program bruk2pipe are sent directly to the processingpipeline without the need to save an intermediate converted FID on disk. Thesize of the final result is (512)(128)(64)(64) points. Processing time: 8 hr. 20 min. on a Sun Sparc 10 workstation.

Figure 4. Annotated 4D format conversion and processing script for a(256*)(64*)(16*)(16*) point 4D 13C-13C correlated 1H-1H NOEFID, illustrating use of both 2D and 3D transpose. Acquisition parameters havebeen abbreviated by $ARGS and phase correction steps have been omitted to savespace. This scheme performs the same processing as the script shown in Figure3, but in this version, a 3D in-memory transpose is used to avoid saving one ofthe intermediate results. The size of the final result is (512)(128)(64)(64)points. Processing time: 7 hr. 55 min. on a Sun Sparc 10workstation.

Figure 5. Annotated 3D processing script for amide-detected data,illustrating the use of inverse processing features in a linear predictionscheme. The scheme took 4 hr. 55 min. to perform on a Sun Sparc 10 workstationwith a 3D CBCA(CO)NH FID of (512*)(52*)(32*) points. The result isbased on an intermediate amide-proton dimension size of 1024 points, yieldinga 3D spectrum of (299)(256)(128) points after extraction of theamide-proton region and deletion of imaginary data. In the scheme, LP is usedon the indirectly-detected Y-axis and Z-axis of the data. This scheme isarranged so that when LP is applied to double the size of a given dimension, theother dimensions have been completely processed with a window function,zero-filling, and phasing. This localizes the signals as much as possible in theother dimensions and thus simplifies the signal content of the dimension to bepredicted (Kay et al., 1991). In the scheme, the X-axis is processed in thefirst pass, the Z-axis is processed in the second pass, the Y-axis is extendedvia LP and processed in the third pass, and the Z-axis is inverse-processed,extended via LP, and reprocessed in the fourth pass.

Figure 6. Annotated 3D processing script for amide-detected data,illustrating the use of inverse processing features in a 2D Maximum EntropyReconstruction scheme. The scheme took 16 hr. 45 min. to perform on a Sun Sparc10 workstation for a 3D 15N-NOE FID of (512*)(128*)(64*) points. Theresult is based on an intermediate amide-proton dimension size of 1024 points, yielding a 3D spectrum of (420)(512)(128) points after extraction ofthe amide-proton region and deletion of imaginary data. In the scheme, 2D MEM is applied to planes in the indirectly-detected Y-axis (1H) and Z-axis (15N) ofthe data, which were each acquired with a one-dwell delay. The scheme isarranged to temporarily reorder the data so that the MEM function is providedwith a stream of data planes from the indirect dimensions (the original Y-axisand Z-axis). The indirect dimensions are first processed by right-shifting,Fourier processing, and automated zero-order baseline correction to compensatefor the one dwell time acquisition delay; the Fourier processing includes use ofwindow functions to increase the effectiveness of the automated baselinecorrection. The planes are then reprocessed so that they are presented forMaximum Entropy reconstruction already phased, baseline corrected, andextensively zero-filled, but transformed without any window functions. Additional argument details are given in the Appendix.

Figure 7. The NMRDraw graphical processing and analysis interface,illustrating interactive processing of a 1D vector extracted from the Z-axis ofa 3D interferogram. The topmost border of the program window describes thecurrent functions of the mouse buttons. The command panel along the topcontains graphical tools for executing commands, selecting the region of data toview, setting contour parameters, and adjusting phase values. The 2D contourdisplay shows the fourth transformed HN/13CO plane from a partially transformedHNCO spectrum (Z-axis (15N) data is still in the time-domain), with positivedata drawn in a continuous range of blue colors, and negative data in a rangeof red colors. The small window over the contour display at the top left is apop-up command area for entering nmrPipe processing commands. Thecross-hair superimposed over the contour display shows the user-selectedlocation for extraction of the Z-axis 1D vector. The time-domain vector itself,drawn along the bottom of the display, is shown after interactive extension vialinear prediction. The Fourier processed version of the vector, also preparedinteractively, is drawn above the 1D time-domain data.

Figure 8. The NMRDraw graphical processing and analysis interface,illustrating operation of the program's facilities by pipeline communicationwith a remote application, allowing separation of assignment and analysisprograms and the graphics system. The remote application can be a program or aTCL script. Shown is a prototype application for browsing through strips fromrelated amide-detected 3D experiments. In the application, the remote programdecides what spectral regions and other graphics should be displayed, andtransmits appropriate instructions to NMRDraw. In turn, NMRDraw transmitsinformation about user input such as mouse clicks, so the remote program canrespond to the user. The strips from a given spectrum are displayed in pairsshowing orthogonal views at the given 1HN/15N coordinate, and strips fromrelated spectra can be overlaid to highlight corresponding signals if desired. In this illustration, the four pairs of strip displayed show data from: a CBCANHspectrum; a CBCA(CO)NH spectrum; an overlay of CBCANH and CBCA(CO)NH spectra; anHNCO spectrum. The square inset at upper right displays the correspondinglocation from a 2D 1H/15N correlated spectrum, and the list at the lower righttabulates peak locations selected by the user via the mouse.

Figure 9. Overhead processing time due to pipeline communication andmanagement for a 32 MB data set measured on a Sun Sparc 10 workstation. Asshown, the overhead time increases roughly linearly with increasing numbers offunctions in the pipeline. In this case, the best-fit least-squares line, alsoshown, represents an overhead of 0.19 sec/MB for each additional stage in thepipeline.

Table 1. Processing Functions of the nmrPipe Program a

Computer TypeTime, sec
SGI Challenge, 4 R4400 CPUs b154
SGI Challenge, 4 R4400 CPUs c187
HP 9000/755239
SGI Indigo408
DEC Alpha 3000 d487
SGI Challenge, 1 R4400 CPU e525
Sun Sparc 10644
IBM RS6000/5301128
Sun Sparc 21208
Sun Sparc 11864
Convex C3830 f2146

Nmrdraw

a Processing of a (512*)(64*)(32*) Point HNCO FID using the script given in Figure 2.Times reported are actual times elapsed. No special attempt was made tovectorize or parallelize the code; only ordinary optimizing compilers were used.During processing, each axis size was doubled by zero filling, yielding aspectrum of (417)(128)(64) points after extraction of the amide-protonregion and deletion of imaginary data.

b This time is based on a distributed version of theprocessing script, which divides each processing task into fourequal parts, one for each CPU.

c This time is based on an ordinary version of theprocessing script, whosecomponents are distributed automatically between CPUs by the operating systembecause they are separate programs.

d This version of the software wascompiled with a four-byte floating pointcompatibility mode, which is roughly half as fast as the best speed of theCPU.

e This time is based on execution of the script on a single CPU.

f This time was measured under heavy loading (44 users).

Table 3. Network-Distributed Parallel Processing Times a

a Processing times for a Z-axisLinear Prediction Application on a Network of SGI Indigo Computers.An interferogram of (512)(128)(32*) points was extended to(512)(128)(64*) points by forward-backward LP with 8 complex coefficients, andthe result was doubled by zero filling and Fourier processed. The processingtask was divided equally on each computer involved.

b The parallel efficiency is computed assuming that the ideal increase inprocessing speed is proportional to the number of computers used.






Download