Why Aidge?#

Aidge is a generic, multi-paradigms compute graph manipulation, quantization, mapping, scheduling and code generation tool. Its primary targets are embedded systems with specialized hardware accelerators, especially dataflow or restricted instruction set architectures. It is highly interoperable thanks to built-in ONNX import/export and direct PyTorch interface, and its modularity allows to use any of its features in standalone or in conjunction with other tools along the deployment path of a model to the embedded system.

Please check below some reasons that might encourage you to use Aidge:

Well-defined dataflow graph IR model #

In Aidge, the computing graph is always explicitly specified by the user, who knows exactly which operators are instantiated. The notions of graph (a set of nodes), mathematical operator and actual implementation are separated: a node contains an operator, which itself points to an implementation. Furthermore, the graph only defines a topology, without any assumption on execution priority.

Here are some interesting Aidge features:

A view can be created on any set or subset of nodes, which itself can constitute a (meta) node;
Graphs can be hierarchical (to any depth), thanks to meta operators that contain any graph view. It is trivial to (recursively) replace a set of nodes by a meta node or inversely, flatten a meta node to a set of nodes;
It is possible to specify any type of operator, even without a corresponding implementation, thanks to generic operators. No need to write a single line of code to handle custom operators at the graph level;
Nodes inputs and outputs are ordered, and any set of connected nodes can be ranked in a unique and deterministic way, making graph isomorphism identification trivial.
Cyclic graphs are supported. Unlike ONNX, any kind of graph can be fully flattened in Aidge and thus optimized globally.

Below is an example of a cyclic graph (an LSTM) in Aidge:

flowchart TB FC_5("ltsm_inputGateH\n(FC#5)") Add_1("ltsm_inputGate\n(Add#1)") Sigmoid_1("ltsm_inputGateAct\n(Sigmoid#1)") Mul_1("ltsm_inputGateMul\n(Mul#1)") FC_2("ltsm_cellCandidateX\n(FC#2)") FC_6("ltsm_cellCandidateH\n(FC#6)") Add_2("ltsm_cellCandidate\n(Add#2)") Tanh_0("ltsm_cellCandidateAct\n(Tanh#0)") FC_3("ltsm_outputGateX\n(FC#3)") Mul_0("ltsm_forgetGateMul\n(Mul#0)") Sigmoid_0("ltsm_forgetGateAct\n(Sigmoid#0)") Add_0("ltsm_forgetGate\n(Add#0)") FC_4("ltsm_forgetGateH\n(FC#4)") FC_0("ltsm_forgetGateX\n(FC#0)") FC_7("ltsm_outputGateH\n(FC#7)") Add_3("ltsm_outputGate\n(Add#3)") Sigmoid_2("ltsm_outputGateAct\n(Sigmoid#2)") Mul_2("ltsm_outputGateMul\n(Mul#2)") Tanh_1("ltsm_cellUpdatedAct\n(Tanh#1)") FC_1("ltsm_inputGateX\n(FC#1)") Add_4("ltsm_add\n(Add#4)") Memorize_1("ltsm_cell_state\n(Memorize#1)") Memorize_0("ltsm_hidden_state\n(Memorize#0)") Identity_0("ltsm_input\n(Identity#0)") Pop_0(Pop#0):::rootCls FC_5-->|"0→1"|Add_1 Add_1-->|"0→0"|Sigmoid_1 Sigmoid_1-->|"0→0"|Mul_1 Mul_1-->|"0→1"|Add_4 FC_2-->|"0→0"|Add_2 FC_6-->|"0→1"|Add_2 Add_2-->|"0→0"|Tanh_0 Tanh_0-->|"0→1"|Mul_1 FC_3-->|"0→0"|Add_3 Mul_0-->|"0→0"|Add_4 Sigmoid_0-->|"0→0"|Mul_0 Add_0-->|"0→0"|Sigmoid_0 FC_4-->|"0→1"|Add_0 FC_0-->|"0→0"|Add_0 FC_7-->|"0→1"|Add_3 Add_3-->|"0→0"|Sigmoid_2 Sigmoid_2-->|"0→0"|Mul_2 Mul_2-->|"0→0"|Memorize_0 Tanh_1-->|"0→1"|Mul_2 FC_1-->|"0→0"|Add_1 Add_4-->|"0→0"|Tanh_1 Add_4-->|"0→0"|Memorize_1 Memorize_1-->|"1→1"|Mul_0 Memorize_0-->|"1→0"|FC_4 Memorize_0-->|"1→0"|FC_5 Memorize_0-->|"1→0"|FC_6 Memorize_0-->|"1→0"|FC_7 Identity_0-->|"0→0"|FC_0 Identity_0-->|"0→0"|FC_1 Identity_0-->|"0→0"|FC_2 Identity_0-->|"0→0"|FC_3 Pop_0-->|"0→0"|Identity_0 input0((in#0)):::inputCls--->|"→0"|Pop_0 input1((in#1)):::inputCls--->|"→1"|Memorize_0 input2((in#2)):::inputCls--->|"→1"|Memorize_1 Memorize_0--->|"0→"|output0((out#0)):::outputCls Memorize_1--->|"0→"|output1((out#1)):::outputCls classDef inputCls fill:#afa classDef outputCls fill:#ffa classDef externalCls fill:#ccc classDef producerCls fill:#ccf classDef genericCls fill:#f9f9ff,stroke-width:1px,stroke-dasharray: 5 5 classDef metaCls stroke-width:5px classDef rootCls stroke:#f00 classDef producerCls_rootCls stroke:#f00,fill:#ccf classDef genericCls_rootCls stroke:#f00,fill:#f9f9ff,stroke-width:1px,stroke-dasharray: 5 5 classDef metaCls_rootCls stroke:#f00,stroke-width:5px

Powerful graph search & replace engine #

Aidge introduces a simple and efficient DSL for graph matching, sometimes called “graph regex”. It is possible to write complex textual queries to find a quantified or unquantified set of nodes with specific types, attributes and/or relationships between them. This is particularly useful to implement sophisticated pattern-matching heuristics with no effort!

Here is an example of a query that you can do in Aidge:

graph_regex = aidge_core.GraphRegex()
graph_regex.add_query("(Pad#?->Conv#|Deconv#->Pad#?)->ReLU#?->Add#1?->ReLU#?->MaxPool#?->ReLU#?")
graph_regex.add_query(".->Add#1?")

for match in graph_regex.match(model):
    aidge_core.GraphView.replace(match, MyCustomIPOperator())

You can define your own node test function as well:

def myNodeTestingFunc(node):
    ...
    return predicate

graph_regex = aidge_core.GraphRegex()
graph_regex.set_node_key("test", myNodeTestingFunc)
graph_regex.add_query("Conv->test")
...

See the tutorial Perform advanced graph matching with the Graph Regular Expression tool for more information.

Generic, compiler-agnostic tiling methods #

Operator tiling is an important operation in high-level compilers such as Halide or TVM, but usually implies an IR lowering step, such as C code generation, and a standard compiler backend, such as LLVM. In Aidge, tiling does not make any assumption on the programming paradigm. Thus, tiling is done at the same graph IR level, by just expanding the compute graph with tiled operators. It is up to the user to choose the right tiling granularity, depending on the type of operators he uses. The tiled operator implementation may be a C kernel, a call to a specific hardware accelerator or HLS-generated operator, or none of that, if the new tiled graph is just made to be exported back in ONNX, ready to be fed to another tool.

See the tutorial Optimize the inference of your neural network with Tiling for more information.

Well-defined consumer-producer model and scheduling #

Aidge introduces a well-defined consumer-producer (C-P) model for operator implementations, similar to transaction-level modeling (TLM) for electronic design. A generic, default implementation is provided as well. C-P model can be specified as precise amounts of data or arbitrary data quantity (token), for each operator and dynamically at each execution step. The C-P model execution path is decoupled from the data execution path, thus allowing to statically schedule the graph execution without providing the actual operator’s implementation.

For example, for a 2D convolution implementation that only processes one input line at each execution step, it is trivial to build a pipelined dataflow and get the right execution order, by overloading two C-P methods in the implementation:

Elts_t MyCustomPipelinedConvImpl::getNbRequiredData(IOIndex_t inputIdx) {
    // Consume a single input line at each execution
    return Elts_t::DataElts(getInputLineSize(inputIdx));
}

Elts_t MyCustomPipelinedConvImpl::getRequiredMemory(IOIndex_t outputIdx) {
    if (enoughDataToComputeOutputLine()) {
        // Produce an output line only if there is enough input data
        return Elts_t::DataElts(getOutputLineSize(outputIdx));
    }
    else {
        return Elts_t::DataElts(0);
    }
}

Thanks to Aidge’s C-P model, arbitrary complex cyclic and acyclic dataflow graphs can be statically scheduled. Generic sequential and parallel schedulers are available, and custom schedulers can be built using static scheduling data (logical early and late execution steps and associated dependencies for each scheduled node).

Generic hardware model and mapping heuristics #

🚧 This is planned for the next major Aidge release, stay tuned!

Simple and effective code generation engine #

Aidge uses the Jinja template engine to easily generate any type of code/programming model from a graph. It provides facilities to easily define what should be generated for each operator type. Beyond that, optimized static scheduling and memory mapping can be generated as well. Eventually, the full compute graph can be generated in an entirely static configuration, with minimal to no control overhead whatsoever during the execution of the dataflow on the intended hardware target, even in multi-threaded environments.

See the tutorial Add a custom implementation for a cpp export for more information.

Seamless interoperability with ONNX, PyTorch and TensorFlow #

Aidge has native built-in ONNX import and export capabilities, even for custom ONNX operators.

🚧 As for now, Aidge only implements a restricted set of ONNX operators (unsupported operators are loaded as generic operators), but the list is growing! Advanced PyTorch and Tensorflow interoperability are planned for the next major Aidge release.

Well-characterized, state-of-the-art PTQ and QAT methods #

🚧 This is planned for the next major Aidge release, stay tuned!