When developing the original data processing paradigm for GenericFX/BrainFX/CamBAfx, it is envisaged that the data must be processed by activating the program for the modules it encounters as it flows through the pipeline. This is, in no doubt, the most common way of processing data through the pipelines to the extend that we can be assumed that all program modules must be activated as the data flow through the pipeline for the first time.
Unfortunately, in practice, to insist that program must be activated as a condition for the data to flow thorough the module is too hash. Even if one carefully prepare the dataset to be processed and select the most “used and understood” pipeline, there is still a possibility that the data processing will fail somewhere along the pipeline. The more datasets one queue up for processing, the higher the chances that this will occur. Then there are circumstances where the data is to be reprocessed by a slightly different parameter set which means only a few module in the pipeline needed to be activated in a repeat run.
If processing the dataset through the pipeline is fast than reactivating all programs is not a problem. Unfortunately, GenericFX is not created for this type of fast pipeline. One of the main motivation for creating GenericFX as a batch-processing application is that it remove the need for users to periodically return to the application to push data through the next processing step manually, especially for long-running pipeline. Take Time Series Analysis (TSAFX) in CamBAfx for example, a typical 4D fMRI dataset of (64x64x21x333) can take an hour to process. As the smallest group of data consists typically around 10 datasets, it means it take 11 hours (10 hours for processing individual dataset + 1 hour for group map generation) to generate a usable result. It is therefore unacceptable to have to wait 11 hours simply because one chosen to change the “Error Pixel Per Image”, a parameter that is only used when generating the final activation map, i.e. used in the hour of processing when the data need not be regenerated.
There must be a way to permit allow suppress the activation of the redundant programs to speed up processing where appropriate. Thus, the idea of pipeline schemes is born.
Basic Idea Behind “Pipeline Schemes”
Pipeline schemes determines which program is activated and which is deactivated (bypassed) as the data flows through the pipelines. At its most simple form there are two possible schemes, valid for all pipelines. They are:
- RUN_ALL_SCHEME: Activate all programs in the pipeline. In GenericFX, it is equivalent to not having implemented “Pipeline Scheme” at all
- SKIP_ALL_SCHEME: Deactivate all programs in the pipeline. This is equivalent to not bother to run the pipeline at all.
“Pipeline Schemes” In Practice
INPUT TABLE SPREADSHEET
As shown in the figure below, there is a column named “Processing Scheme” in the Input Table Spreadsheet. This column controls the “Pipeline Scheme” for individual processing job, i.e., the dataset for the row.

There are at least three schemes in this column:
- Follow Pipeline : This is the default scheme. It in effect, differs the choice of “Pipeline Scheme” to that when chosen when pipeline processing is invoked (See section RUN DIALOG)
- Run ALL: This ensures that ALL modules in the pipeline will be activated for this dataset.
- IGNORE ALL: This remove this dataset completely from the pipeline processing.
There can be more schemes in this column. Those schemes are specific to the pipelines. The figure above shows two pipeline specific schemes “Subject Proprocessing Only” and “Group Response Estimation Only” Please consult the pipeline development team if you have any question about these schemes.
RUN DIALOG
In the Run Dialog (Run ->Run…), under “Job Scheme”, you may specify the “Pipeline Scheme” for the pipeline.

The choice of the Pipeline Scheme will affect datasets that has its “Processing Scheme” column set to “Follow Pipeline”. In addition, it also controls modules that cannot be logically part of any individual dataset processing, e.g. when the group map is generated for TSAfx. (Technically, it means the pipeline section after the use of GFXControlSignalModule). The two schemes that are always available are “RUN_ALL_SCHEME” and “SKIP_ALL_SCHEME” as described before. The rest are schemes specific to individual pipeline.
EXAMPLE
Time Series Analysis (TSAfx) pipeline in CamBA, is actually the joining of individual dataset processing and generation of Groupmap. As shown in the diagram below, it can be broken into three large chunk (rectangular box): Preprocessing, Response Estimation and Group Map

The significant practical implication is users often only wants to use part of the pipeline depending on the stage of the fMRI experiments:
- Data Acquisition Stage : “Preprocessing”. To ensure that there is no problem with the scanner data.
- Initial Processing : “Response Estimation”. Here the brain response (as fMRI Dataset) is correlated with the stimulus/Response data (design matrix)
- Group Inference : Using “Group map” to generate a group map for a group to eliminate individual biases.
In general, “Preprocessing” are only done once. “Response Estimation” and “Group map” generation can occur several times as new design matrices are explored and group activation map are fine-tuned.
To achieve these effects, these are the possible “Pipeline Schemes Setting”
| Desire Effect | Settings | |
|---|---|---|
| Individual Datasets Pipeline Scheme | Run Dialog Pipeline Schemes | |
| Preprocessing | Subject Preprocessing Only | SKIP_ALL_SCHEME |
| Subject Response Estimation Only | Subject Response Estimation Only | SKIP_ALL_SCHEME |
| Group Map Only | Follow Pipeline | Group Response Estimation Only |
| Subject Response Estimation and Group Map | Subject Reponse Estimation/td> | Group Response Estimation Only |
Unconstrained deactivation of program can leads to invalid results
Sharped-eye readers will noticed that there is no way for users to specify exactly which program is surpressed in the pipeline. Rather, users are forced to use one of the predefined schemes, schemes that are dictated to them by the pipeline developer.
At present, there is no plan to provide users with the ability to choose which program to supress. It is not a big task to create a user interface that permit this, rather, GenericFX development team is worried about problems that unconstrained modification of pipelines can bring about. The most serious problem is the accidental suppression of critical program modules, such as those program modules that should had been activated to propagate the modified results correctly. This can occurs too easily. For example, using the same TSAfx pipeline above as an example, if one altered some parameters in “Preprocessing” stage, but when repeating the data analysis accidentally suprressed “Response Esitmation”, the “Group Map” generation will be blissfully unaware that it processed the wrong data (old data) which in turn, leads the user to the wrong conclusion (Changing the preprocessing parameter does not change the results).
By restricting the ability to modify the pipeline through Pipeline Scheme, the development team aims to strike a balance between fine-tuning pipelines for the optimum processing time and the correctness of data processing.
The next article will concentrate on how to define pipeline schemes and deliver it to the pipeline processing engine. It is only of interest to downstream developers, especially pipelines developers.
December 31, 2007 at 4:38 pm |
Dear Maximus
The follow up articles are here:
http://genericfx.wordpress.com/2006/07/18/behind-the-scene-look-at-pipeline-schemes/
http://genericfx.wordpress.com/2006/07/21/developing-pipeline-schemes/