Installation#

There are two ways of using MatFlow:

The MatFlow command-line interface (CLI)
The MatFlow Python package

Both of these options allow workflows to be designed and executed. The MatFlow CLI is recommended for beginners and strongly recommended if you want to run MatFlow on a cluster. The Python package allows workflows to be designed and explored via the Python API and is recommended for users comfortable working with Python. If you are interested in contributing to the development of MatFlow, the Python package is the place to start.

The CLI and the Python package can be used simultaneously.

Using pip#

The recommended way to install MatFlow is to use pip to install the Python package from PyPI:

pip install --pre matflow

This installs the python package, which also gives the CLI version of MatFlow.

Release notes#

Release notes for this version (0.3.0a215) are available on GitHub. Use the version switcher in the top-right corner of the page to download/install other versions.

Alternative installation methods#

Although not currently recommended, advanced users may wish to use one of the alternative installation methods.

Configuration#

MatFlow uses a config file to control details of how it executes workflows. A default config file will be created the first time you submit a workflow. This will work without modification on a personal machine, however if you are using MatFlow on HPC you will likely need to make some modifications to describe the job scheduler, and settings for multiple cores, and to point to your MatFlow environments file.

Some examples are given for the University of Manchester’s CSF.

If there is a suitable config file for your HPC system, you can pull the relevant file using the following syntax (example shown for Manchester’s CSF3):

matflow config import github://hpcflow:matflow-configs@main/manchester-CSF3.yaml

After pulling a config file using the above command, you still need to edit it to set the path to your MatFlow environments file. The path to your config file can be found using matflow manage get-config-path, or to open the config file directly, use matflow open config.

The path to your config file can be found using matflow manage get-config-path, or to open the config file directly, use matflow open config.

Environments#

MatFlow has the concept of environments, similar to python virtual environments. These are required so that tasks can run using the specific software they require. Your MatFlow environments must be defined in your environments (YAML) file before MatFlow can run workflows, and this environment file must be pointed to in the config file via the environment_sources key. Once this has been done, your environment file can be be opened using matflow open env-source.

A template environments file is given below. It is recommended to use this as a starting point, making modifications for your own computer/HPC system, in particular the setup sections for each environment.

Note that currently MatFlow works with DAMASK version 3.0.0a7.post0 but not the latest versions. As such the MatFlow damask_parse environment should use pip install damask==3.0.0a7.post0.

Note also that any MatFlow environment which activates a python virtual environment as part of the setup, must also have the MatFlow python package installed, and it must be the same version as is used to submit the workflow. In practice, this is most easily achieved by creating one python virtual environment and using it in each of these MatFlow environments and to submit workflows.

Environment templates#

Linux/macOS#

- name: damask_parse_env
  setup: |
    source /full/path/to/.venv/bin/activate
  executables:
  - label: python_script
    instances:
    - command: python "<<script_path>>" <<args>>
      num_cores:
        start: 1
        stop: 32
      parallel_mode: null

- name: formable_env
  setup: |
    source /full/path/to/.venv/bin/activate
  executables:
  - label: python_script
    instances:
    - command: python "<<script_path>>" <<args>>
      num_cores:
        start: 1
        stop: 32
      parallel_mode: null

- name: defdap_env
  setup: |
    source /full/path/to/.venv/bin/activate
  executables:
  - label: python_script
    instances:
    - command: python "<<script_path>>" <<args>>
      num_cores:
        start: 1
        stop: 32
      parallel_mode: null

- name: damask_env
  setup: |
    module load mpi/intel-18.0/openmpi/4.1.0
    IMG_PATH=/full/path/to/DAMASK-docker-images/damask-grid_3.0.0-alpha7.sif
    export HDF5_USE_FILE_LOCKING=FALSE
  executables:
  - label: damask_grid
    instances:
    - command: singularity run $IMG_PATH
      num_cores: 1
      parallel_mode: null
    - command: mpirun singularity run $IMG_PATH
      num_cores:
        start: 2
        stop: 32
      parallel_mode: null

- name: matlab_env
  setup: |
    module load matlab/module/file/version
    MTEX_DIR=/full/path/to/toolboxes/mtex/mtex-6.0.0
  executables:
  - label: run_mtex
    instances:
    - command: |
          for dir in $(find ${MTEX_DIR} -type d | grep -v -e ".git" -e "@" -e "private"); do MATLABPATH="${dir};${MATLABPATH}"; done
          export MATLABPATH=${MATLABPATH}
          matlab -softwareopengl -singleCompThread -batch "addpath('<<script_dir>>'); <<script_name_no_ext>> <<args>>"
      num_cores: 1
      parallel_mode: null
  - label: compile_mtex
    instances:
    - command: |
        for dir in $(find ${MTEX_DIR} -type d | grep -v -e ".git" -e "@" -e "private" -e "data" -e "makeDoc" -e "templates" -e "nfft_openMP" -e "compatibility/")
        do
          MTEX_INCLUDE="-I ${dir} ${MTEX_INCLUDE}"
        done
        export MTEX_INCLUDE="${MTEX_INCLUDE} -a ${MTEX_DIR}/data -a ${MTEX_DIR}/plotting/plotting_tools/colors.mat"
        mcc -R -singleCompThread -R -softwareopengl -m "<<script_path>>" <<args>> -o matlab_exe ${MTEX_INCLUDE}
      num_cores: 1
      parallel_mode: null
  - label: run_compiled_mtex
    instances:
    - command: |
        export MATLAB_RUNTIME=/full/path/to/matlab/runtime-or-installation
        ./run_matlab_exe.sh ${MATLAB_RUNTIME} <<args>>
      num_cores: 1
      parallel_mode: null

- name: python_env
  executables:
  - label: python_script
    instances:
    - command: python "<<script_path>>" <<args>>
      num_cores:
        start: 1
        stop: 32
      parallel_mode: null

- name: dream_3D_env
  executables:
  - label: dream_3D_runner
    instances:
    - command: /full/path/to/dream3d/DREAM3D-6.5.171-Linux-x86_64/bin/PipelineRunner
      num_cores: 1
      parallel_mode: null
  - label: python_script
    instances:
    - command: python "<<script_path>>" <<args>>
      num_cores: 1
      parallel_mode: null

Windows#

- name: matlab_env
  executables:
    - label: run_mtex
      instances:
        - command: |
            & 'C:\path\to\matlab.exe' -batch "addpath('<<script_dir>>'); <<script_name_no_ext>> <<args>>"
          num_cores: 1
          parallel_mode: null

    - label: compile_mtex
      instances:
        - command: |
            $mtex_path = 'C:\path\to\mtex\folder'
            & 'C:\path\to\mcc.bat' -R -singleCompThread -m "<<script_path>>" <<args>> -o matlab_exe -a "$mtex_path/data" -a "$mtex_path/plotting/plotting_tools/colors.mat"
          num_cores: 1
          parallel_mode: null

    - label: run_compiled_mtex
      instances:
        - command: .\matlab_exe.exe <<args>>
          num_cores: 1
          parallel_mode: null
- name: dream_3D_env
  executables:
    - label: dream_3D_runner
      instances:
        - command: "& 'C:\\path\\to\\DREAM3D-directory\\PipelineRunner.exe'"
          num_cores: 1
          parallel_mode: null
    - label: python_script
      instances:
        - command: python "<<script_path>>" <<args>>
          num_cores: 1
          parallel_mode: null

Tips for SLURM#

MatFlow currently has a fault such that it doesn’t select a SLURM partition based on the resources requested in your workflow file. As such, users must manually define this in their workflow files e.g.

resources:
  any:
    scheduler_args:
      directives:
        --time: 00:30:00
        --partition: serial

Note also that for many SLURM schedulers, a time limit must also be specified as shown above.

A default time limit and partition can be set in the config file, which will be used for tasks which don’t have this set explicitly in a resources block like the example above.