Common Commandline Interface

Motivation And High Level Example

  • Provide a common interface for executables to expose options
  • Provide a common interface for executables to be called
  • Provide a common interface for exposing metadata of tool, such as memory usage, cpu usage, required temp files

Benefits

  • A consistent concrete common interface for shelling out to an executable
  • task options have a consistent model for validation
  • task version is supported
  • A principled model for wrapping tools. For example, pbalign would “inherit” blasr options and extend, or wrap them.
  • Once a manifest has been defined and registered to pbsmrtpipe, the task/manifest can be referenced in pipelines with no additional work

Terms

  • ‘Tool Contract’ is a single file that exposing the exe interface. It contains metadata about the task, such as input and output file types, nproc.
  • ‘Resolved Tool Contract’ is a single file that contains the resolved values in the manifest
  • ‘Driver’ is the general interface for calling a commandline exe. This can be called from the commandline or directly as an API call (via any language which supports the manifest interface).

Hello World Dev Example

Tool Contract example for an exe, ‘python -m pbcommand.cli.example.dev_app` with tool contract id pbcommand.tasks.dev_app.

{
    "version": "0.2.1",
    "driver": {
        "serialization": "json",
        "exe": "python -m pbcommand.cli.example.dev_app --resolved-tool-contract ",
        "env": {}
    },
    "schema_version": "2.0.0",
    "tool_contract": {
        "task_type": "pbsmrtpipe.task_types.standard",
        "resource_types": [
            "$tmpfile",
            "$tmpfile",
            "$tmpdir"
        ],
        "description": "Dev app for Testing that supports emitting tool contracts",
        "schema_options": [
            {
                "optionTypeId": "integer",
                "default": 25,
                "id": "pbcommand.task_options.dev_read_length",
                "name": "Length filter",
                "description": "Min Sequence Length filter"
            }
        ],
        "output_types": [
            {
                "title": "Filtered Fasta file",
                "description": "Filtered Fasta file",
                "default_name": "filter",
                "id": "fasta_out",
                "file_type_id": "PacBio.FileTypes.Fasta"
            }
        ],
        "_comment": "Created by pbcommand 0.5.2",
        "name": "Example Dev App",
        "input_types": [
            {
                "description": "PacBio Spec'ed fasta file",
                "title": "Fasta File",
                "id": "fasta_in",
                "file_type_id": "PacBio.FileTypes.Fasta"
            }
        ],
        "nproc": 1,
        "is_distributed": false,
        "tool_contract_id": "pbcommand.tasks.dev_app"
    },
    "tool_contract_id": "pbcommand.tasks.dev_app"
}

Details of Tool Contract

  • Tool Contract id which can be referenced globally (e.g., within a pipeline template)
  • Input File types have file type id, id that can be referenced within the driver, and a brief description
  • Output File types have a file type id and a default output file name
  • number of processors is defined by $nproc. “$” prefixed values are symbols that have well defined semantic meaning
  • Temp files and Log files are defined using “$” symbols are can have multiple items
  • the exe options are exposed via jsonschema standard. Each option has an id and maps to a single schema definition. Each option must have a default value.
  • the exe section of the “driver” is the commandline interface that will be called as a positional arguement (e.g., “my-exe resolved-manifest.json”)
  • task type describes if the task should be submitted to the cluster resources

Note. A single driver can reference many manifests. For example “pbreports” would have a single driver exe. From the “task_manifest_id”, the driver would dispatch to the correct function call

Programmatically defining a Parser to Emit a Tool Contract

pbcommand provides a API to create a tool contract and an argparse instance from a single interface. This facilitates a single point of defining options and keeps the standard commandline entry point and the tool contract to be in sync.

This also allows your tool to emit the tool contract to stdout using “–emit-tool-contract” and the tool to be run from a Resolved Tool Contract using the “–resolved-tool-contract /path/to/resolved-tool-contract.json” commandline argument while also supporting the python standards commandline interface via argparse.

Complete App shown below.

"""Simple CLI dev app for testing Emitting Tool Contracts and Running from Resolved Tool Contracts"""

import logging
import sys

from pbcommand.utils import setup_log
from pbcommand.cli import pbparser_runner
from pbcommand.models import FileTypes, get_pbparser, ResourceTypes


# This has the same functionality as the dev_simple_app
from .dev_simple_app import run_main

log = logging.getLogger(__name__)

__version__ = '0.2.1'

# Used for the tool contract id. Must have the form {namespace}.tasks.{name}
# to prevent namespace collisions. For python tools, the namespace should be
# the python package name.
TOOL_ID = "pbcommand.tasks.dev_app"


def add_args_and_options(p):
    """
    Add input, output files and options to parser.

    :type p: PbParser
    :return: PbParser
    """
    # FileType, label, name, description
    p.add_input_file_type(FileTypes.FASTA, "fasta_in", "Fasta File", "PacBio Spec'ed fasta file")
    # File Type, label, name, description, default file name
    p.add_output_file_type(FileTypes.FASTA, "fasta_out", "Filtered Fasta file", "Filtered Fasta file", "filter")
    # Option id, label, default value, name, description
    # for the argparse, the read-length will be translated to --read-length and (accessible via args.read_length)
    p.add_int("pbcommand.task_options.dev_read_length", "read-length", 25, "Length filter", "Min Sequence Length filter")
    return p


def get_contract_parser():
    """
    Central point of programmatically defining a Parser.
    :rtype: PbParser
    :return: PbParser
    """
    # Commandline exe to call "{exe}" /path/to/resolved-tool-contract.json

    driver_exe = "python -m pbcommand.cli.example.dev_app --resolved-tool-contract "
    desc = "Dev app for Testing that supports emitting tool contracts"
    subcomponents = [("my_subcomponent", "1.2.3")]

    resource_types = (ResourceTypes.TMP_FILE,
                      ResourceTypes.TMP_FILE,
                      ResourceTypes.TMP_DIR)

    p = get_pbparser(TOOL_ID,
                     __version__,
                     "Example Dev App",
                     desc,
                     driver_exe,
                     is_distributed=False,
                     resource_types=resource_types,
                     subcomponents=subcomponents)

    add_args_and_options(p)
    return p


def args_runner(args):
    """Entry point from argparse"""
    log.debug("raw args {a}".format(a=args))
    return run_main(args.fasta_in, args.fasta_out, args.read_length)


def resolved_tool_contract_runner(resolved_tool_contract):
    """Run from the resolved contract

    :param resolved_tool_contract:
    :type resolved_tool_contract: ResolvedToolContract
    """

    in_file = resolved_tool_contract.task.input_files[0]
    out_file = resolved_tool_contract.task.output_files[0]
    min_read_length = resolved_tool_contract.task.options["pbcommand.task_options.dev_read_length"]
    r = run_main(in_file, out_file, min_read_length)
    return r


def main(argv=sys.argv):
    log.info("Starting {f} version {v} pbcommand example dev app".format(f=__file__, v=__version__))
    # PbParser instance, this has both the argparse instance and the tool contract
    # instance.
    mp = get_contract_parser()
    # To Access the argparse instance
    # mp.arg_parser.parser
    # The Tool Contract parser
    # mp.tool_contract_parser.parser
    return pbparser_runner(argv[1:],
                           mp,
                           args_runner,
                           resolved_tool_contract_runner,
                           log,
                           setup_log)


if __name__ == '__main__':
    sys.exit(main())

Note

Options must be prefixed with {pbcommand}.task_options.{option_id} format.

Details and Example of a Resolved Tool Contract

  • Language agnostic JSON format to encode the resolved values
  • input, outputs file types are resolved to file paths
  • nproc and other resources are resolved
  • IO layers to convert between JSON and python using load_resolved_tool_contract_from in pbcommand.pb_io

Example Resolved Tool Contract:

{
  "driver": {
    "env": {},
    "exe": "python -m pbcommand.cli.example.dev_app --resolved-tool-contract "
  },
  "resolved_tool_contract": {
    "input_files": [
      "/tmp/tmpVgzvudfasta"
    ],
    "nproc": 1,
    "options": {
      "pbcommand.task_options.dev_read_length": 27
    },
    "output_files": [
      "/tmp/file.fasta"
    ],
    "resources": [],
    "is_distributed": false,
    "task_type": "pbsmrtpipe.task_types.standard",
    "tool_contract_id": "pbcommand.tools.dev_app",
    "log_level": "INFO"
  }
}

Testing Tool Contracts

There is a thin test framework in pbcommand.testkit to help test tool contracts from within nose.

The PbTestApp base class will provide the core validation of the outputs as well as handled the creation of the resolved tool contract.

Output Validation assertions

  • validates Output files exist
  • validates resolved task options
  • validates resolved value of is distributed
  • validates resolved value of nproc

Example:

from __future__ import absolute_import
import logging

from .base_utils import get_data_file
from pbcommand.testkit import PbTestApp
from pbcommand.resolver import ToolContractError

log = logging.getLogger(__name__)


class TestRunDevApp(PbTestApp):
    DRIVER_BASE = "python -m pbcommand.cli.examples.dev_app "
    REQUIRES_PBCORE = True
    INPUT_FILES = [get_data_file("example.fasta")]
    TASK_OPTIONS = {"pbcommand.task_options.dev_read_length": 27}


class TestTxtDevApp(PbTestApp):
    DRIVER_BASE = "python -m pbcommand.cli.examples.dev_txt_app "
    # XXX using default args, so the emit/resolve drivers are automatic
    REQUIRES_PBCORE = False
    INPUT_FILES = [get_data_file("example.txt")]
    TASK_OPTIONS = {"pbcommand.task_options.dev_max_nlines": 27}
    RESOLVED_TASK_OPTIONS = {"pbcommand.task_options.dev_max_nlines": 27}


class TestQuickDevHelloWorld(PbTestApp):
    """Runs dev_qhello_world """
    DRIVER_EMIT = "python -m pbcommand.cli.examples.dev_quick_hello_world  emit-tool-contract pbcommand.tasks.dev_qhello_world "
    DRIVER_RESOLVE = "python -m pbcommand.cli.examples.dev_quick_hello_world  run-rtc "

    REQUIRES_PBCORE = False
    INPUT_FILES = [get_data_file("example.txt")]
    IS_DISTRIBUTED = True
    RESOLVED_IS_DISTRIBUTED = True


class TestQuickTxt(PbTestApp):
    """Runs dev_qhello_world """
    DRIVER_EMIT = "python -m pbcommand.cli.examples.dev_quick_hello_world  emit-tool-contract pbcommand.tasks.dev_txt_hello "
    DRIVER_RESOLVE = "python -m pbcommand.cli.examples.dev_quick_hello_world  run-rtc "

    REQUIRES_PBCORE = False
    INPUT_FILES = [get_data_file("example.txt")]
    IS_DISTRIBUTED = True
    RESOLVED_IS_DISTRIBUTED = False # XXX is_distributed=False in task TC!


class TestQuickCustomTxtCustomOuts(PbTestApp):
    """Runs dev_qhello_world """
    DRIVER_EMIT = "python -m pbcommand.cli.examples.dev_quick_hello_world  emit-tool-contract pbcommand.tasks.dev_txt_custom_outs "
    DRIVER_RESOLVE = "python -m pbcommand.cli.examples.dev_quick_hello_world  run-rtc "

    REQUIRES_PBCORE = False
    INPUT_FILES = [get_data_file("example.txt")]


class TestOptionTypes(PbTestApp):
    DRIVER_BASE = "python -m pbcommand.cli.examples.dev_mixed_app"
    REQUIRES_PBCORE = False
    INPUT_FILES = [get_data_file("example.txt")]
    TASK_OPTIONS = {
        "pbcommand.task_options.alpha": 50,
        "pbcommand.task_options.beta": 9.876,
        "pbcommand.task_options.gamma": False,
        "pbcommand.task_options.ploidy": "diploid"
    }
    RESOLVED_TASK_OPTIONS = {
        "pbcommand.task_options.alpha": 50,
        "pbcommand.task_options.beta": 9.876,
        "pbcommand.task_options.gamma": False,
        "pbcommand.task_options.ploidy": "diploid",
        "pbcommand.task_options.delta": 1,
        "pbcommand.task_options.epsilon": 0.1
    }


class TestBadChoiceValue(TestOptionTypes):
    TASK_OPTIONS = {
        "pbcommand.task_options.alpha": 50,
        "pbcommand.task_options.beta": 9.876,
        "pbcommand.task_options.gamma": False,
        "pbcommand.task_options.ploidy": "other"
    }

    def test_run_e2e(self):
        self.assertRaises(ToolContractError, super(TestBadChoiceValue, self).test_run_e2e)


class TestQuickOptionTypes(PbTestApp):
    DRIVER_EMIT = "python -m pbcommand.cli.examples.dev_quick_hello_world  emit-tool-contract pbcommand.tasks.dev_test_options"
    DRIVER_RESOLVE = "python -m pbcommand.cli.examples.dev_quick_hello_world run-rtc "
    INPUT_FILES = [get_data_file("example.txt")]
    TASK_OPTIONS = {
        "pbcommand.task_options.alpha": 50,
        "pbcommand.task_options.beta": 9.876,
        "pbcommand.task_options.gamma": False,
        "pbcommand.task_options.ploidy": "diploid"
    }
    RESOLVED_TASK_OPTIONS = {
        "pbcommand.task_options.alpha": 50,
        "pbcommand.task_options.beta": 9.876,
        "pbcommand.task_options.gamma": False,
        "pbcommand.task_options.ploidy": "diploid",
        "pbcommand.task_options.delta": 1,
        "pbcommand.task_options.epsilon": 0.01
    }

Tips

A dev tool within pbcommand can help convert Tool Contract JSON files to Resolved Tool Contract for testing purposes.

usage: python -m pbcommand.interactive_resolver [-h] [--version] tc_path

Positional Arguments

tc_path Path to Tool Contract

Named Arguments

--version show program’s version number and exit

Note

This tool has dependency on prompt_kit and can be installed via pip.