Skip to content

🧬 Formed

CI Python version License PyPI version

Formed is a framework for managing data/experiments/workflows in both research and production environments. It is designed to be flexible and extensible, and to provide a simple and consistent interface for managing data and workflows.

Features

  • DAG-based workflow system: Define complex workflows as directed acyclic graphs (DAGs) to manage dependencies and execution order.
  • Experiment tracking: Keep track of experiments, parameters, and results in a structured manner.
  • Built-in integration with popular data science libraries: Seamlessly work with libraries like PyTorch, 🤗 Transformers, MLflow, and more.
  • Extensible architecture: Easily extend the framework with custom components and plugins.

Basic Usage

Define steps in a workflow using the @workflow.step decorator with type hints:

# mysteps.py

from collections.abc import Iterator
from formed import workflow

@workflow.step
def load_dataset(size: int) -> Iterator[int]:
    for i in range(size):
        yield i

@workflow.step
def square(dataset: Iterator[int]) -> Iterator[int]:
    for i in dataset:
        yield i * i

Create a workflow definition using Jsonnet:

# workflow.jsonnet

{
  steps: {
    dataset: {
      type: 'load_dataset',
      size: 10
    },
    results: {
      type: 'square',
      dataset: { type: 'ref', ref: 'dataset' } // reference to dataset
    }
  }
}

Setup the project configuration in formed.yml:

# formed.yml

workflow:
  organizer:
    type: filesystem  # store execution results in the filesystem

required_modules:
  - mysteps           # include the custom steps module

Run the workflow via the command line interface:

formed run workflow.jsonnet --execution-id my-experiment

Documentation Guide