In “Using Pipeline Schemes“, the public face for Pipeline Schemes is discussed. In this blog posting, we are going to look underneath the hood, to see why the Pipeline Schemes is developed the way it did and how developers can create and utilizes Pipeline Schemes.
Background
No sooner than the CamBAfx’s Time Series Analysis (TSAfx) pipeline is created, Dr John Suckling proposed the ability to skip programs in the pipeline. The rational is simple: Users should not need to reprocess the data unnecessarily, especially since reprocessing takes a lot of time. Back in the days when CamBA was BAMM, this need, while there, is not that apparent. At the time, processing of individual fMRI dataset and Group Map generation where separate pipelines, known as FBAMM and GBAMM respectively. While Group maps are constantly recreated, there is in general, no need to repeat FBAMM. It is true that power users like me do split the FBAMM pipeline into “Preprocessing” and “Response Estimation”, but it is done more for error-checking the scanner data (raw data) and the desire to save processing time later. Most users would just do all the FBAMMs in a go.
TSAfx is effectively created by combining FBAMM and GBAMM pipeline into one. This throws the idea of insisting that data is to be processed from the beginning everytime processing is done out of the door immediately. The rational behind processing all data everytime processing is called is to simplify the pipeline. To have all programs activated as they are encountered in the pipeline is straightforward to visualize and to implement. Unfortunately, the cost of reprocessing the data starting from raw scanner data is prohibitive just to get a new Group map. Hence, it is necessary to find shortcuts through the processing pipeline.
Moreover, program parameters get twick over time, a lot of time can be saved if programs in the datapath unaffected by changes in program parameters are deactivated.
I recognized long ago that pipelines need to be trimmed at some point in the data analysis process. However, at that time, the task of trimming pipelines was one of the function to be performed by the Visual Pipeline Editor. This neatly preserve the simplicity offered by activating every program as it is encountered during processing.
With CamBAfx, there is of course no Visual Pipeline Editor.
Uncontrolled deactivation of programs is also bad
The general strategy in deactivating programs in the pipeline is to decide, as the programs are encountered during the processing of the data, whether to activate the program, or simply bypass (deactivate) it and go on to the next program in the pipeline. The easiest implementation for deactivating pipeline program, is of course to present the user with a list of program in the pipeline, and ask him to choose which program to deactivate. This strategy is the non-graphical equivalent of allowing the user to trim the pipeline using the Visual Pipeline Editor.
Like trimming pipeline using the Visual Pipeline Editor, this unconstrained deactivation of programs have significant drawbacks:
- Users have to know the dependencies between the program or otherwise he/she creates an invalid but functional pipeline (illegal pipeline). In particular, it is important that programs that are encountered after the data had changed are not deactivated in order to ensure that the changes propagate to the end of the pipeline.
- Most users will not have any idea on the dependencies on the program and therefore cannot take advantage of the this scheme.
- Even for users that can take advantage of this scheme, there is a risk that he/she inadvertantly create an illegal pipeline.
- It can be time-consuming to tick off porgrams everytime processing is needed.
Fortunately, people do not normally deactivate individual program, but program blocks
Experience users knows the risk of deactivating individual program. While this provides the best performance/efficiency, the risk of creating illegal pipelines normally outweights the efficiency/performance gain. After all, how fast is wrong? To know that your results is wrong thus implicating the illegal pipeline are only “best case” scenario which are rare. Cases of illegal pipelines can go undetected for a long time, thus leaviing the users under the shadow of a dark cloud.
Instead, experience users compromised by deactivation programs in block of program. Programs are grouped into their function blocks. The functional blocks for TSAfx for example, is as shown below.

Each function block (rectangular block) consists of one or more programs organized by function the collectively provide. Programs in each function block are enabled/disabled at the same time. The activation/deactivation of function block is a compromise. It does not offer the best performance, but it do give great certainty that the program activation/deactivation is correct and proper.
If this system is also easier to learn and if this activation/deactivation by functional block can be passed on to novices, then more people can benefits from the system.
Function blocks simplied programs activation/deactivation. However, it does not remove the dependencies between programs. The dependencies may be simplied into dependencies between function blocks, but it is still there. Taking the TSAfx example above, you cannot skip “Response Estimation” if you have to repeat “Preprocessing”. One can view dependencies between blocks as the description of the logical task one wish to perform. Again taking TSAfx as an example, by selectively enabling/disabling functional block, we can achieve the following:
| Active Blocks | Logical observation |
|---|---|
| Preprocessing | Preprocessing of individual dataset |
| Preprocessing and Response Estimation | Single subject processing |
| Group map | Generate Group maps |
and much more.
Pipeline Scheme XML
Defining Pipeline Schemes
Pipeline schemes defined and stored as part of the Pipeline file (.epl).
<gfx:gfxDocument>
...
<gfx:Preferences>
...
<gfx:pipelineSchemeList
xmlns:gfx='http://genericfx.org/namespace'
class="org.genericfx.data.jobScheme.jobSchemeCollection">
<gfx:pipelineScheme
xmlns:gfx='http://genericfx.org/namespace'
class="org.genericfx.data.jobScheme.jobScheme"
id="Subject Preprocessing Only"
base="SKIP_ALL_SCHEME"
modification="+removebg:+movecorrect-reg:+movecorrect-move:+spinexcitcorrect:+remean:+spatsmooth" />
<gfx:pipelineScheme
xmlns:gfx='http://genericfx.org/namespace'
class="org.genericfx.data.jobScheme.jobScheme"
id="Subject Response Estimation Only"
base="SKIP_ALL_SCHEME"
modification="+responseestimation:+subj_voxelsignificanttest:+subj_clustersignificanttest" />
<gfx:pipelineScheme xmlns:gfx='http://genericfx.org/namespace'
class="org.genericfx.data.jobScheme.jobScheme"
id="Group Response Estimation Only"
base="SKIP_ALL_SCHEME"
modification="+standardspace:+medianim:+grp_voxelsignificanttest:+grp_clustersignificanttest" />
</gfx:pipelineSchemeList>
...
<gfx:Preferences>
...
</gfx:gfxDocument>
Pipeline Schemes (gfx:pipelineScheme) are store as list of pipeline schemes (gfx:pipelineSchemeList) which is in turn, stored in the gfx:Preferences of the Pipeline Document. The XML attributes “class” has its common meaning for Pipeline Document and for the purpose of creating Pipeline Schemes, just copy it as it is.
In gfx:pipelineScheme, “id” is the ID for the Pipeline Scheme. It is used to identify the Pipeline Scheme and is displayed to the user. The attributes “base” and “modification” is used to define the pipeline scheme. Attribute “base” defines this Pipeline Scheme with respect to another pipeline scheme. “base=’AnotherPipelineScheme’” means this Pipeline Scheme inherits the program activation and deactivation list from ‘AnotherPipelineScheme’ and adds its own modification in as defined in “modification” attribute. In modification attribute, a colon (“:”) separated list of modules with its activation status are given. Modules UID are used to identify individual program module. A “+” (plus) prefix (or absence of prefix) signify that the program is activated, a “-” (minus) deactivation.
Using “base” attribute, gfxpipelineScheme are recursively defined using other Pipeline Schemes until it reaches one of the “Stop Schemes”: There are two predefined Pipeline Schemes that serves as the top of tree for “base”: RUN_ALL_SCHEME and SKIP_ALL_SCHEME. As their name implies, RUN_ALL_SCHEME means every program in the pipeline is activated while SKIP_ALL_SCHEME means every program in the pipeline is deactivated.
Communicating Pipeline Schemes to Processing Engine
When the user launch the batch-processing, his/her processing preference, as express in the Input Table SpreadSheet and the “Launch Dialog”, will be written to a “Workorder” file with “.wko” extension. Currently, to express the tight nit nature of Workorder file with the Data file (TaskList), they share the same name. This file is then passed to the pipeline processing engine via “–workorder=/path/to/workorder”
The format of the file is
<gfx:jobList
xmlns:gfx='http://genericfx.org/namespace'
class="org.genericfx.data.jobScheme.jobScheme">
<gfx:job
xmlns:gfx='http://genericfx.org/namespace'
class="org.genericfx.data.jobScheme.jobScheme"
jobid="jobuid"
schemeid="schemeuid" />
</gfx:jobList>
which is a list (gfx:jobList) of job (gfx:job) that links individual datasets in the Input Table SpreadSheet identified by a unique id given to it (jobid) automatically by GenericFX, to the scheme ID (schemeid) as defined in gfx:pipelineScheme->id in the pipeline document. The default option for “Processing Schemes” in the Input Table SpreadSheet is “Follow Pipeline”. This maps to “DEFAULT_PIPELINE_SCHEME” for schemeid.
For datasets who opted for “IGNORE_ALL” Processing Scheme, it will be omitted from both the TaskList and WorkOrder file. Hence, as far as batch data processing is concern, these datasets does not exists.
The scheme selected by user in the Launch Dialog (File->Run or its equivalent) is encoded as just another gfx:job. However, this gfx:job is given the jobid of “DEFAULT_PIPELINE_SCHEME“. This has a special meaning as its schemeid is used for all gfx:job that has schemeid of DEFAULT_PIPELINE_SCHEME, or for datasets whose jobid are not mentioned in this WorkOrder file.
Pipeline Schemes are OPTIONAL
It is important to note that neither the UI that you see, nor Pipeline Processing Engine are technically is required to implement Pipeline Schemes. Pipeline Schemes are optional feature. While it is true that the function Pipeline Schemes provide are tremandous help to users, they are, technically speaking, not an essential part of pipeline processing.
In the next article …
The next article will describe the how Pipeline Schemes is implemented in GenericFX. It will only be of interest to GenericFX maintainer.