Quick Start¶
Installation¶
pip install formed
If you want to use integrations, install the corresponding extra packages:
pip install formed[mlflow]
pip install formed[torch]
pip install formed[flax]
pip install formed[transformers]
pip install formed[sentence-transformers]
pip install formed[all] # install all integrations
Basic Usage¶
This section walks you through building a simple workflow to explore properties of the Collatz sequence. You'll learn the fundamental workflow of formed:
- Define steps as reusable computation units
- Define a workflow configuration that connects steps
- Configure formed settings
- Run the workflow via CLI
- Inspect the results programmatically
1. Define Steps¶
Steps are the building blocks of workflows. Each step is a Python function
decorated with @workflow.step:
# collatz.py
from collections.abc import Iterator
from formed import workflow
@workflow.step
def generate_collatz(init: int) -> Iterator[int]:
n = init
while n != 1:
yield n
if n % 2 == 0:
n //= 2
else:
n = 3 * n + 1
yield 1
@workflow.step
def count_steps(sequence: Iterator[int]) -> int:
return sum(1 for _ in sequence) - 1
@workflow.step
def max_value(sequence: Iterator[int]) -> int:
return max(sequence)
@workflow.step
def summarize(**kwargs: int) -> dict[str, int]:
logger = workflow.use_step_logger(__name__)
logger.info(f"Summary: {kwargs}")
return kwargs
Each step is automatically cached based on its inputs and source code, so re-running the workflow will skip unchanged steps.
2. Define Workflow¶
Workflows are defined using Jsonnet (or JSON) configuration files. The configuration specifies which steps to run and how to connect their inputs and outputs:
// config.jsonnet
{
steps: {
collatz_sequence: {
type: "generate_collatz",
init: 7,
},
step_count: {
type: "count_steps",
sequence: {type: "ref", ref: "collatz_sequence"},
},
summary: {
type: "summarize",
step_count: {type: "ref", ref: "step_count"},
},
}
}
The type field specifies which step function to use, and other fields are
passed as arguments. Use {type: "ref", ref: "step_name"} to reference outputs
from other steps.
3. Configure Formed¶
# formed.yml
workflow:
organizer:
type: filesystem
required_modules:
- collatz
The formed.yml file configures how formed manages workflows:
workflow.organizer: Specifies how to store and manage workflow execution results. Thefilesystemorganizer saves results to local files in the.formeddirectory.required_modules: Lists Python modules containing step definitions. Formed automatically loads these modules and all their submodules, making steps available for use in workflows.
4. Run Workflow¶
Execute the workflow from the command line in the same directory as formed.yml:
formed workflow run config.jsonnet --execution-id ex1
The --execution-id flag assigns a unique identifier to this execution,
allowing you to retrieve results later. Formed will execute the workflow DAG,
caching each step's results.
5. Check Results¶
After execution completes, you can programmatically access the results:
from formed import workflow
from formed.settings import load_formed_settings
settings = load_formed_settings("formed.yml")
organizer = settings.organizer
context = organizer.get(workflow.WorkflowExecutionID("ex1"))
assert context is not None
summary = context.cache[context.info.graph["summary"]]
The organizer.get() method retrieves the execution context by ID, and context.cache provides access to cached step results. You can use step names (like "summary") to look up results in the workflow graph.
For more details on workflow concepts and advanced features, see the Workflow Guide.
Python API¶
You can also define and run workflows entirely in Python without configuration files:
import logging
from formed import workflow
logging.basicConfig(level=logging.INFO)
organizer = workflow.MemoryWorkflowOrganizer()
executor = workflow.DefaultWorkflowExecutor()
graph = workflow.WorkflowGraph.from_config(
{
"steps": {
"collatz_sequence": {
"type": "generate_collatz",
"init": 7,
},
"step_count": {
"type": "count_steps",
"sequence": {"type": "ref", "ref": "collatz_sequence"},
},
"summary": {
"type": "summarize",
"step_count": {"type": "ref", "ref": "step_count"},
},
}
}
)
context = organizer.run(executor, graph)
print(f"{context.cache[context.info.graph['summary']]=}")
This approach uses MemoryWorkflowOrganizer to store results in memory instead
of on disk, which is useful for quick experiments and testing.
Working with Integrations¶
Formed integrates with popular tools like MLflow, PyTorch, and 🤗 Transformers. This example demonstrates using MLflow to track experiments.
First, install the MLflow integration:
pip install formed[mlflow]
Then update your formed.yml to use the MLflow organizer:
# formed.yml
workflow:
organizer:
- type: filesystem
+ type: mlflow
+ experiment_name: collatz
required_modules:
- collatz
+ - formed.integrations.mlflow
The MLflow organizer automatically logs workflow executions as MLflow runs, including parameters, metrics, and artifacts.
Run the workflow as before:
formed workflow run config.jsonnet --execution-id ex1
View the results in the MLflow UI:
mlflow ui
This integration allows you to track experiments, compare results, and manage model versions using MLflow's interface. For more integration examples, see the Tutorials.
Further Reading¶
- Tutorials: Practical examples including text classification and language model fine-tuning
- Guides: In-depth explanations of workflow concepts, caching, and advanced features
- API Reference: Complete API documentation for all modules and integrations