Step-By-Step Guide

Let’s walk through an example of how to curate some simple data on Flywheel. For this tutorial, you will at the very least need access to Flywheel and a text editor.

Step 1: Understanding Your Dataset in the Context of BIDS

Before you can curate the dataset into BIDS, it’s important to be able to predict how your dataset should look in BIDS. If you don’t know what BIDS is, check the official readthedocs.

Our goal here will be to map DICOMs to NIfTIs named correctly in BIDS, including the directory structure, correct metadata sidecars, and fieldmap files:

_images/dicom-reorganization-transparent-white_1000x477.png

To start, we need to figure out what we can use to create this “mapping”. In fw-heudiconv curation, this mapping is called a heuristic, and we’ll use the DICOMs’ header information to create rules for this mapping. To extract this information, we will use fw-heudiconv-tabulate to generate a seqinfo table.

fw-heudiconv-tabulate

In the Flywheel GUI, navigate to your project, and select the “Analysis” tab:

_images/project_analysis_view.png

Click the “Run Analysis Gear” button, which will drop down the analysis box. In this box, select Flywheel HeuDiConv as the gear to run the analysis.

_images/project_gear_view.png

From here, click the “Configuration” tab (there are no inputs required at this stage). This will allow you to set the configuration for the gear. Under “Action”, select “Tabulate”, and make sure to uncheck dry_run. When ready, hit “Run Gear”!

The same can be accomplished at the command line, with this command:

fw-heudiconv-tabulate --project FlywheelTools_TestData --path MY/OUTPUT/DIRECTORY/

The Output

You should now see an analysis object appear in the GUI. This analysis object is associated with the project, since we started it at the project level. If a blue gear is spinning, the gear is still running (this can include virtual machine initialization and shut down time); a red X means it failed, but a green check means success! You should be able to check the “Gear Logs” in the analysis object to read through stdout (all the commands and outputs) as the gear ran.

_images/tabulate_log.png

In the Results section of the analysis, Flywheel zips all the data it was instructed to save as outputs – in this case, the result of our tabulation. Download this file and unzip it, afterwhich you can open it in your table viewer or text editor of choice.

_images/tabulate_table.png

Next, we’re going to use this table to curate one of the subjects. Fortunately, in the table viewer, we can use a filter to only show data from one subject. Here, we pick subject 019465 using the patient_id column.

Developing a Simple Heuristic

To start, open up any text editor, such as Notepad or TextEdit. We’re going to start by curating the anatomical T1w image, whose DICOM is highlighted here:

_images/t1w_seqinfo.png

In-depth knowledge of these functions is not necessary for this tutorial, but see The Heuristic File if you want to understand each of them in earnest. First, copy and paste the create_key() function into a new file in your text editor:

def create_key(template, outtype=('nii.gz',), annotation_classes=None):
    if template is None or not template:
        raise ValueError('Template must be a valid format string')
    return template, outtype, annotation_classes

Next, use this function to create a BIDS name for the T1w NIfTI you want:

def create_key(template, outtype=('nii.gz',), annotation_classes=None):
    if template is None or not template:
        raise ValueError('Template must be a valid format string')
    return template, outtype, annotation_classes

# Create Keys
t1w = create_key(
   'sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_T1w')

When fw-heudiconv runs this heuristic, there will exist a variable called t1w, and it will have the string specifying the BIDS file name and path for a T1w (relative to the BIDS root). The next step is making sure that the DICOM we selected will be assigned to this variable. The next function we will use to do that is the infotodict function:

def infotodict(seqinfo):

    info = {
      t1w: []
    }

    for s in seqinfo:
        if "MPRAGE" in s.series_description:
            info[t1w].append(s.series_id)

    return info

After the function is defined with def, we create the info object – a Python dictionary with one key, t1w, and an empty list. Our goal is to populate this dictionary with the list of DICOMs who belong to the t1w key.

The input to this function, seqinfo, is each row from your seqinfo table. So looping over the object seqinfo gives you access to each row of your table, where the variables in the table are accessed using Python.

In our example above, we access series_description and use Python logic to check if it contains the string MPRAGE. We know our T1w is the only one that has this string:

_images/t1w_highlighted.png

So we append the series_id value of that row (the unique identifier of the DICOM) to the list of files that should be named this way – the t1w key. The heuristic at this point should look like this:

def create_key(template, outtype=('nii.gz',), annotation_classes=None):
    if template is None or not template:
        raise ValueError('Template must be a valid format string')
    return template, outtype, annotation_classes

# Create Keys
t1w = create_key(
   'sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_T1w')

# loop over the seqinfo table
def infotodict(seqinfo):

    # the dictionary of keys and list of files they correspond to
    info = {
      t1w: []
    }

    # loop over each row of your seqinfo table
    for s in seqinfo:

        # if the series description contains "MPRAGE",
        # add the DICOM identifier to the dictionary

        if "MPRAGE" in s.series_description:
            info[t1w].append(s.series_id)

        # a print line to tell us T1w was not found
        print("This seqinfo is not the MPRAGE:", s.series_description)
    return info

Note

A good habit for debugging is to print out the seqinfo rows that did not meet any tests and haven’t been assigned to a key

Save this file as my_test_heuristic.py – we’re going to use it in the next section to curate the T1w image!

Curating The First Image with fw-heudiconv

The first step to curating the data is to upload this file to the Flywheel project. Although files can be attached to any object, we recommend attaching this to the project so that all other projects can access it easily.

In the Flywheel GUI, access the “Information” tab of the project, and upload your heuristic file using the “Upload Attachment” button:

_images/upload_attachment.png

Now, we’re going to launch a gear on a single session. Pick a session from the subject we’ve been developing on (019465 – in this case, the session is 9793). Gears run from the session level by default, though it is possible to launch them from a subject.

In the top right, click “Run Gear”. As before, a dropdown should appear for you to select the Flywheel HeuDiConv gear from the Analysis Gears list. This time, though, select an input — in the heuristic input box, click “Select Input”. Here’ you’ll be presented with a drop down to let you pick which object to look for input files. The hierarchy at the top shows that it’s looking at the current session and acquisitions:

_images/inputs_dropdown.png

Instead of this, click the Project label to select files attached to your project, and select your heuristic. In the “Configuration” tab, select “Curate” under the “action” option. You can leave the “dry_run” box checked – we will uncheck it after this test run.

When you’re ready, hit “Run Gear”! Take note of the analysis’ name to refer back to later.

To monitor progress of the gear, click on the session’s “Provenance” tab. A grey pause symbol indicates that the job is queued, a blue cog indicates that it is running, and a green check or red triangle indicates that the gear has finished, successfully or unsuccessfully, respectively; refresh the page to update the status of running gears.

At the command line, this achieved with the following command:

fw-heudiconv-curate --project FlywheelTools_TestData --heuristic PATH/TO/DIRECTORY/my_test_heuristic.py --subject 019465 --session 9793

The Output

Next you can navigate to the output of the gear to see what happened. From within the session, click on either the “Analysis” (for strictly any analysis gears that have run) or “Provenance” (for a listing of any kind of gear that has operated on the session) and navigate to your recent fw-heudiconv analysis. From here, select “View Log”.

First, there are virtual machine instructions, stuff Flywheel uses to let us know what kind of virtual machine the gear ran in. Then, we have initialization instructions from fw-heudiconv’s “gear manager”:

Gear Name: fw-heudiconv, Gear Version: 0.2.15_0.3.3
Executor: computebbl-31f3d27f, CPU: 8 cores, Memory: 55GB, Disk: 208GB, Swap: 32GB
Gear starting...

INFO: ==============: fw-heudiconv gear manager starting up :===============

INFO: Calling fw-heudiconv with the following settings:
INFO: Project: FlywheelTools_TestData
INFO: Subject(s): ['019465']
INFO: Session(s): ['9793']
INFO: Heuristic: /flywheel/v0/input/heuristic/my_test_heuristic.py
INFO: Action: Curate
INFO: Dry run: True
INFO: Call: fw-heudiconv-curate --verbose --project FlywheelTools_TestData --dry-run --subject 019465 --session 9793 --heuristic /flywheel/v0/input/heuristic/my_test_heuristic.py
INFO: =================: fw-heudiconv curator starting up :=================

Pay attention to the Call directive; this prints the command line equivalent of what is running in the gear.

Next, the actual fw-heudiconv outputs. We see that fw-heudiconv first attempts to load your heuristic and then lists out all of your seqinfo objects (the rows from the table in the first part) with the series_description first, followed by other columns:

INFO: Loading heuristic file...
INFO: Heuristic loaded successfully!
INFO: Querying Flywheel server...
DEBUG: Found project: FlywheelTools_TestData (6075d65d0da0131135e9b471)
DEBUG: Found sessions:
    9793 (607732a764d3dfc86e6510d4)
INFO: Applying heuristic to 9793 (1/1)...
DEBUG: Found SeqInfos:
Localizer:
            [TR=0.0086 TE=0.004 shape=(512, 512, 3, -1) image_type=('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')] (607732a75b936738e644aee8)
MPRAGE_TI1100_ipat2:
              [TR=1.81 TE=0.00345 shape=(256, 192, 160, -1) image_type=('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')] (607732a8b9d367cae5e9b0c5)
:
:
:

This is good, as we can confirm the table data with the seqinfo object the tool is using to curate your data. For example, we know there’s a DICOM with the series description MPRAGE_TI1100_ipat2, and we searched specifically for the string MPRAGE. So, did we catch this seqinfo?

HCP_REST_BOLD_MB8_469:
  [TR=0.8 TE=0.037 shape=(936, 936, 469, -1) image_type=('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'MOSAIC')] (607732a864d3dfc86e6510d7)

DEBUG:
MPRAGE_TI1100_ipat2_2.nii.gz
    sub-019465_ses-9793_T1w.nii.gz -> sub-019465/ses-9793/anat/sub-019465_ses-9793_T1w.nii.gz
INFO: Done!
INFO: ===================: Exiting fw-heudiconv curator :===================

Excellent! Now we know that the NIfTI from this DICOM, MPRAGE_TI1100_ipat2_2.nii.gz, will be mapped to a file named sub-019465_ses-9793_T1w.nii.gz, and the path is listed there too. Additionally, we get printouts of the seqinfo objects that didn’t get caught by our logic.

To see our BIDS data before curation, go to the session view and click the “BIDS View” toggle; there should be no BIDS data:

_images/uncurated.png

Now that we know it works, we can run it again with the “dry_run” box unchecked to apply the changes. The only difference should be that the log lets you know the changes are being applied:

HCP_REST_BOLD_MB8_469:
  [TR=0.8 TE=0.037 shape=(936, 936, 469, -1) image_type=('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'MOSAIC')] (607732a864d3dfc86e6510d7)

INFO: Applying changes to files...
DEBUG:
MPRAGE_TI1100_ipat2_2.nii.gz
sub-019465_ses-9793_T1w.nii.gz -> sub-019465/ses-9793/anat/sub-019465_ses-9793_T1w.nii.gz
INFO: Done!
INFO: ===================: Exiting fw-heudiconv curator :===================

Now, in the session view, hit the “BIDS View” toggle:

_images/one_curated.png

We’ve successfully curated one of our images into BIDS!

Adding More Images

By now, it should be clear that as the tool loops over the rows in the seqinfo table, you can add all sorts of logic to capture additional seqinfo objects and assign them to keys you create. Below, we edit the heuristic and add a key for the BOLD data in our project, and use similar logic to assign data to the key:

def create_key(template, outtype=('nii.gz',), annotation_classes=None):
    if template is None or not template:
        raise ValueError('Template must be a valid format string')
    return template, outtype, annotation_classes

# Create Keys

# anatomical
t1w = create_key(
   'sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_T1w')

# fMRI scans
rest_bbl = create_key(
  'sub-{subject}/{session}/func/sub-{subject}_{session}_task-rest_acq-BBL_bold')

# loop over the seqinfo table
def infotodict(seqinfo):

    # the dictionary of keys and list of files they correspond to
    # now contains two scans
    info = {
      t1w: [], rest_bbl: []
    }

    # loop over each row of your seqinfo table
    for s in seqinfo:

        # if the series description contains "MPRAGE",
        # add the DICOM identifier to the dictionary

        if "MPRAGE" in s.series_description:
            info[t1w].append(s.series_id)

        elif "bbl1_restbold" in protocol:
            info[rest_bbl].append(s.series_id)

        # a print line to tell us T1w was not found
        print("This seqinfo is not the MPRAGE or rs-fMRI:", s.series_description)
    return info

But what if we have a fieldmap? Not only do we need to name it correctly, but we also have to make sure it points to the BOLD data. We can do this quite flexibly using the special IntendedFor keyword. This keyword is set outside of the infotodict for loop and makes use of existing keys:

def create_key(template, outtype=('nii.gz',), annotation_classes=None):
    if template is None or not template:
        raise ValueError('Template must be a valid format string')
    return template, outtype, annotation_classes

# Create Keys

# anatomical
t1w = create_key(
   'sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_T1w')

# fMRI scans
rest_bbl = create_key(
  'sub-{subject}/{session}/func/sub-{subject}_{session}_task-rest_acq-BBL_bold')

# fieldmaps
b0_phase = create_key(
   'sub-{subject}/{session}/fmap/sub-{subject}_{session}_phasediff')
b0_mag = create_key(
   'sub-{subject}/{session}/fmap/sub-{subject}_{session}_magnitude{item}')


# loop over the seqinfo table
def infotodict(seqinfo):

    # the dictionary of keys and list of files they correspond to
    # now contains two scans
    info = {
      t1w: [], rest_bbl: [], b0_mag: [], b0_phase: []
    }

    # loop over each row of your seqinfo table
    for s in seqinfo:

        # if the series description contains "MPRAGE",
        # add the DICOM identifier to the dictionary

        if "MPRAGE" in s.series_description:
            info[t1w].append(s.series_id)

        elif "bbl1_restbold" in s.series_description:
            info[rest_bbl].append(s.series_id)

        elif "B0map" in s.series_description and "M" in s.image_type:
            info[b0_mag].append(s)
        elif "B0map" in s.series_description and "P" in s.image_type:
            info[b0_phase].append(s)


        # a print line to tell us T1w was not found
        print("Protocol not found!:", s.series_description)
    return info

IntendedFor = {
    b0_phase: [
        '{session}/func/sub-{subject}_{session}_task-rest_acq-BBL_bold.nii.gz'
    ],
    b0_mag: [
        '{session}/func/sub-{subject}_{session}_task-rest_acq-BBL_bold.nii.gz'
    ]
}

Notice that in this heuristic, we use the special {item} keyword in the key for the magnitude fieldmaps. This keeps us from having to write multiple keys. The keyword is iterated over automatically within the NIfTIs for this acquisition. Additionally, we access the image_type column when differentiating between the magnitude and phase fieldmaps.

Lastly, the IntendedFor keyword: it’s a dictionary, like info, containing the keys for each of our fieldmaps, and the values for each key are the files we expect this fieldmap to correct. In this case we must specifically list the files out. Update this heuristic upload it to Flywheel, and try out curation with it.

Wrapping Up

In this walkthrough, you should have learned how to curate your data with fw-heudiconv, starting with discovering data in your DICOM headers, crafting a heuristic for a single T1w image, and then applying it to a session. Next, we went over how to add more images, including how to point fieldmaps to BOLD scans. Now, you’re ready to investigate more functionality of fw-heudiconv. Take a look at the The Heuristic File page for an in-depth look at what more features fw-heudiconv curation can offer, and the tips page for inspiration on how to come up with more creative solutions.