mlir-opt...

% mlir-opt --version
LLVM (http://llvm.org/):
  LLVM version 21.0.0git
  Optimized build with assertions.
% mlir-opt --help
OVERVIEW: MLIR modular optimizer driver

Available Dialects: acc, affine, amdgpu, amx, arith, arm_neon, arm_sme, arm_sve, async, bufferization, builtin, cf, complex, dlti, emitc, func, gpu, index, irdl, linalg, llvm, math, memref, mesh, ml_program, mpi, nvgpu, nvvm, omp, pdl, pdl_interp, ptr, quant, rocdl, scf, shape, smt, sparse_tensor, spirv, tensor, test, test_dyn, test_irdl_to_cpp, tosa, transform, ub, vector, x86vector, xegpu
USAGE: mlir-opt [options] <input file>

OPTIONS:

Color Options:

  --color                                                    - Use colors in output (default=autodetect)

General options:

  --abort-on-max-devirt-iterations-reached                   - Abort when the max iterations for devirtualization CGSCC repeat pass is reached
  --allow-unregistered-dialect                               - Allow operation with no registered dialects
  --arc-contract-use-objc-claim-rv                           - Enable generation of calls to objc_claimAutoreleasedReturnValue
  --atomic-counter-update-promoted                           - Do counter update using atomic fetch add  for promoted counters only
  --atomic-first-counter                                     - Use atomic fetch add for first counter in a function (usually the entry counter)
  --bounds-checking-single-trap                              - Use one trap block per function
  --cfg-hide-cold-paths=<number>                             - Hide blocks with relative frequency below the given value
  --cfg-hide-deoptimize-paths                                - 
  --cfg-hide-unreachable-paths                               - 
  --check-functions-filter=<regex>                           - Only emit checks for arguments of functions whose names match the given regular expression
  --conditional-counter-update                               - Do conditional counter updates in single byte counters mode)
  --cost-kind=<value>                                        - Target cost kind
    =throughput                                              -   Reciprocal throughput
    =latency                                                 -   Instruction latency
    =code-size                                               -   Code size
    =size-latency                                            -   Code size and latency
    =all                                                     -   Print all cost kinds
  --ctx-profile-force-is-specialized                         - Treat the given module as-if it were containing the post-thinlink module containing the root
  --debug-info-correlate                                     - Use debug info to correlate profiles. (Deprecated, use -profile-correlate=debug-info)
  --debugify-atoms                                           - 
  --debugify-func-limit=<ulong>                              - Set max number of processed functions per pass.
  --debugify-level=<value>                                   - Kind of debug info to add
    =locations                                               -   Locations only
    =location+variables                                      -   Locations and Variables
  --debugify-quiet                                           - Suppress verbose debugify output
  --disable-auto-upgrade-debug-info                          - Disable autoupgrade of debug info
  --disable-i2p-p2i-opt                                      - Disables inttoptr/ptrtoint roundtrip optimization
  --do-counter-promotion                                     - Do counter register promotion
  --dot-cfg-mssa=<file name for generated dot file>          - file name for generated dot file
  --dump-pass-pipeline                                       - Print the pipeline that will be run
  --elide-resource-data-from-bytecode                        - Elide resources when generating bytecode
  --emit-bytecode                                            - Emit bytecode when generating output
  --emit-bytecode-version=<value>                            - Use specified bytecode when generating output
  --enable-gvn-hoist                                         - Enable the GVN hoisting pass (default = off)
  --enable-gvn-memdep                                        - 
  --enable-gvn-memoryssa                                     - 
  --enable-gvn-sink                                          - Enable the GVN sinking pass (default = off)
  --enable-jump-table-to-switch                              - Enable JumpTableToSwitch pass (default = off)
  --enable-load-in-loop-pre                                  - 
  --enable-load-pre                                          - 
  --enable-loop-simplifycfg-term-folding                     - 
  --enable-name-compression                                  - Enable name/filename string compression
  --enable-split-backedge-in-load-pre                        - 
  --enable-split-loopiv-heuristic                            - Enable loop iv regalloc heuristic
  --enable-vtable-profile-use                                - If ThinLTO and WPD is enabled and this option is true, vtable profiles will be used by ICP pass for more efficient indirect call sequence. If false, type profiles won't be used.
  --enable-vtable-value-profiling                            - If true, the virtual table address will be instrumented to know the types of a C++ pointer. The information is used in indirect call promotion to do selective vtable-based comparison.
  --expand-variadics-override=<value>                        - Override the behaviour of expand-variadics
    =unspecified                                             -   Use the implementation defaults
    =disable                                                 -   Disable the pass entirely
    =optimize                                                -   Optimise without changing ABI
    =lowering                                                -   Change variadic calling convention
  --experimental-debug-variable-locations                    - Use experimental new value-tracking variable locations
  --force-tail-folding-style=<value>                         - Force the tail folding style
    =none                                                    -   Disable tail folding
    =data                                                    -   Create lane mask for data only, using active.lane.mask intrinsic
    =data-without-lane-mask                                  -   Create lane mask with compare/stepvector
    =data-and-control                                        -   Create lane mask using active.lane.mask intrinsic, and use it for both data and control flow
    =data-and-control-without-rt-check                       -   Similar to data-and-control, but remove the runtime check
    =data-with-evl                                           -   Use predicated EVL instructions for tail folding. If EVL is unsupported, fallback to data-without-lane-mask.
  --fs-profile-debug-bw-threshold=<uint>                     - Only show debug message if the source branch weight is greater  than this value.
  --fs-profile-debug-prob-diff-threshold=<uint>              - Only show debug message if the branch probability is greater than this value (in percentage).
  --generate-merged-base-profiles                            - When generating nested context-sensitive profiles, always generate extra base profile for function with all its context profiles merged into it.
  --hash-based-counter-split                                 - Rename counter variable of a comdat function based on cfg hash
  --hot-cold-split                                           - Enable hot-cold splitting pass
  --hwasan-percentile-cutoff-hot=<int>                       - Hot percentile cutoff.
  --hwasan-random-rate=<number>                              - Probability value in the range [0.0, 1.0] to keep instrumentation of a function. Note: instrumentation can be skipped randomly OR because of the hot percentile cutoff, if both are supplied.
  --import-all-index                                         - Import all external functions in index.
  --instcombine-code-sinking                                 - Enable code sinking
  --instcombine-guard-widening-window=<uint>                 - How wide an instruction window to bypass looking for another guard
  --instcombine-max-num-phis=<uint>                          - Maximum number phis to handle in intptr/ptrint folding
  --instcombine-max-sink-users=<uint>                        - Maximum number of undroppable users for instruction sinking
  --instcombine-maxarray-size=<uint>                         - Maximum array size considered when doing a combine
  --instcombine-negator-enabled                              - Should we attempt to sink negations?
  --instcombine-negator-max-depth=<uint>                     - What is the maximal lookup depth when trying to check for viability of negation sinking.
  --instrprof-atomic-counter-update-all                      - Make all profile counter updates atomic (for testing only)
  --internalize-public-api-file=<filename>                   - A file containing list of symbol names to preserve
  --internalize-public-api-list=<list>                       - A list of symbol names to preserve
  --intrinsic-cost-strategy=<value>                          - Costing strategy for intrinsic instructions
    =instruction-cost                                        -   Use TargetTransformInfo::getInstructionCost
    =intrinsic-cost                                          -   Use TargetTransformInfo::getIntrinsicInstrCost
    =type-based-intrinsic-cost                               -   Calculate the intrinsic cost based only on argument types
  --irdl-file=<filename>                                     - IRDL file to register before processing the input
  --iterative-counter-promotion                              - Allow counter promotion across the whole loop nest.
  --list-passes                                              - Print the list of registered passes and exit
  --load-dialect-plugin=<string>                             - Load dialects from plugin library
  --load-pass-plugin=<string>                                - Load passes from plugin library
  --log-actions-to=<string>                                  - Log action execution to a file, or stderr if  '-' is passed
  --log-mlir-actions-filter=<string>                         - Comma separated list of locations to filter actions from logging
  --lower-allow-check-percentile-cutoff-hot=<int>            - Hot percentile cutoff.
  --lower-allow-check-random-rate=<number>                   - Probability value in the range [0.0, 1.0] of unconditional pseudo-random checks.
  --matrix-default-layout=<value>                            - Sets the default matrix layout
    =column-major                                            -   Use column-major layout
    =row-major                                               -   Use row-major layout
  --matrix-print-after-transpose-opt                         - 
  --max-counter-promotions=<int>                             - Max number of allowed counter promotions
  --max-counter-promotions-per-loop=<uint>                   - Max number counter promotions per loop to avoid increasing register pressure too much
  --mir-strip-debugify-only                                  - Should mir-strip-debug only strip debug info from debugified modules by default
  --misexpect-tolerance=<uint>                               - Prevents emitting diagnostics when profile counts are within N% of the threshold..
  --mlir-debug-counter=<string>                              - Comma separated list of debug counter skip and count arguments
  --mlir-diagnostic-verbosity-level=<value>                  - Choose level of diagnostic information
    =errors                                                  -   Errors only
    =warnings                                                -   Errors and warnings
    =remarks                                                 -   Errors, warnings and remarks
  --mlir-disable-diagnostic-notes                            - Disable diagnostic notes.
  --mlir-disable-threading                                   - Disable multi-threading within MLIR, overrides any further call to MLIRContext::enableMultiThreading()
  --mlir-elide-elementsattrs-if-larger=<uint>                - Elide ElementsAttrs with "..." that have more elements than the given upper limit
  --mlir-elide-resource-strings-if-larger=<uint>             - Elide printing value of resources if string is too long in chars.
  --mlir-enable-debugger-hook                                - Enable Debugger hook for debugging MLIR Actions
  --mlir-generate-reproducer=<filename>                      - Generate an mlir reproducer at the provided filename (no crash required)
  --mlir-output-format=<value>                               - Output format for timing data
    =text                                                    -   display the results in text format
    =json                                                    -   display the results in JSON format
  --mlir-pass-pipeline-crash-reproducer=<string>             - Generate a .mlir reproducer file at the given output path if the pass manager crashes or fails
  --mlir-pass-pipeline-local-reproducer                      - When generating a crash reproducer, attempt to generated a reproducer with the smallest pipeline.
  --mlir-pass-statistics                                     - Display the statistics of each pass
  --mlir-pass-statistics-display=<value>                     - Display method for pass statistics
    =list                                                    -   display the results in a merged list sorted by pass name
    =pipeline                                                -   display the results with a nested pipeline view
  --mlir-pretty-debuginfo                                    - Print pretty debug info in MLIR output
  --mlir-print-debug-counter                                 - Print out debug counter information after all counters have been accumulated
  --mlir-print-debuginfo                                     - Print debug info in MLIR output
  --mlir-print-elementsattrs-with-hex-if-larger=<long>       - Print DenseElementsAttrs with a hex string that have more elements than the given upper limit (use -1 to disable)
  --mlir-print-ir-after=<pass-arg>                           - Print IR after specified passes
  --mlir-print-ir-after-all                                  - Print IR after each pass
  --mlir-print-ir-after-change                               - When printing the IR after a pass, only print if the IR changed
  --mlir-print-ir-after-failure                              - When printing the IR after a pass, only print if the pass failed
  --mlir-print-ir-before=<pass-arg>                          - Print IR before specified passes
  --mlir-print-ir-before-all                                 - Print IR before each pass
  --mlir-print-ir-module-scope                               - When printing IR for print-ir-[before|after]{-all} always print the top-level operation
  --mlir-print-ir-tree-dir=<string>                          - When printing the IR before/after a pass, print file tree rooted at this directory. Use in conjunction with mlir-print-ir-* flags
  --mlir-print-local-scope                                   - Print with local scope and inline information (eliding aliases for attributes, types, and locations)
  --mlir-print-op-on-diagnostic                              - When a diagnostic is emitted on an operation, also print the operation as an attached note
  --mlir-print-skip-regions                                  - Skip regions when printing ops.
  --mlir-print-stacktrace-on-diagnostic                      - When a diagnostic is emitted, also print the stack trace as an attached note
  --mlir-print-unique-ssa-ids                                - Print unique SSA ID numbers for values, block arguments and naming conflicts across all regions
  --mlir-print-value-users                                   - Print users of operation results and block arguments as a comment
  --mlir-timing                                              - Display execution times
  --mlir-timing-display=<value>                              - Display method for timing data
    =list                                                    -   display the results in a list sorted by total time
    =tree                                                    -   display the results ina with a nested tree view
  --mlir-use-nameloc-as-prefix                               - Print SSA IDs using NameLocs as prefixes
  --mlir-very-unsafe-disable-verifier-on-parsing             - Disable the verifier on parsing (very unsafe)
  --no-discriminators                                        - Disable generation of discriminator information.
  --no-implicit-module                                       - Disable implicit addition of a top-level module op during parsing
  -o <filename>                                              - Output filename
  --object-size-offset-visitor-max-visit-instructions=<uint> - Maximum number of instructions for ObjectSizeOffsetVisitor to look at
  --output-split-marker=<string>                             - Split marker to use for merging the ouput
  --pass-pipeline=<string>                                   - Textual description of the pass pipeline to run
  --pgo-block-coverage                                       - Use this option to enable basic block coverage instrumentation
  --pgo-temporal-instrumentation                             - Use this option to enable temporal instrumentation
  --pgo-view-block-coverage-graph                            - Create a dot file of CFGs with block coverage inference information
  --print-pipeline-passes                                    - Print a '-passes' compatible string describing the pipeline (best-effort only).
  --profile-actions-to=<string>                              - Profile action execution to a file, or stderr if  '-' is passed
  --profile-correlate=<value>                                - Use debug info or binary file to correlate profiles.
    =<empty>                                                 -   No profile correlation
    =debug-info                                              -   Use debug info to correlate
    =binary                                                  -   Use binary to correlate
  --run-reproducer                                           - Run the pipeline stored in the reproducer
  --runtime-counter-relocation                               - Enable relocating counters at runtime.
  --safepoint-ir-verifier-print-only                         - 
  --sample-profile-check-record-coverage=<N>                 - Emit a warning if less than N% of records in the input profile are matched to the IR.
  --sample-profile-check-sample-coverage=<N>                 - Emit a warning if less than N% of samples in the input profile are matched to the IR.
  --sample-profile-max-propagate-iterations=<uint>           - Maximum number of iterations to go through when propagating sample block/edge weights through the CFG.
  --sampled-instr-burst-duration=<uint>                      - Set the profile instrumentation burst duration, which can range from 1 to the value of 'sampled-instr-period' (0 is invalid). This number of samples will be recorded for each 'sampled-instr-period' count update. Setting to 1 enables simple sampling, in which case it is recommended to set 'sampled-instr-period' to a prime number.
  --sampled-instr-period=<uint>                              - Set the profile instrumentation sample period. A sample period of 0 is invalid. For each sample period, a fixed number of consecutive samples will be recorded. The number is controlled by 'sampled-instr-burst-duration' flag. The default sample period of 65536 is optimized for generating efficient code that leverages unsigned short integer wrapping in overflow, but this is disabled under simple sampling (burst duration = 1).
  --sampled-instrumentation                                  - Do PGO instrumentation sampling
  --show-dialects                                            - Print the list of registered dialects and exit
  --skip-ret-exit-block                                      - Suppress counter promotion if exit blocks contain ret.
  --speculative-counter-promotion-max-exiting=<uint>         - The max number of exiting blocks of a loop to allow  speculative counter promotion
  --speculative-counter-promotion-to-loop                    - When the option is false, if the target block is in a loop, the promotion will be disallowed unless the promoted counter  update can be further/iteratively promoted into an acyclic  region.
  --split-input-file[=<string>]                                - Split the input file into chunks using the given or default marker and process each chunk independently
  --summary-file=<string>                                    - The summary file to use for function importing.
  Compiler passes to run
    Passes:
      --affine-data-copy-generate                            -   Generate explicit copying for affine memory operations
        --fast-mem-capacity=<ulong>                          - Set fast memory space capacity in KiB (default: unlimited)
        --fast-mem-space=<uint>                              - Fast memory space identifier for copy generation (default: 1)
        --generate-dma                                       - Generate DMA instead of point-wise copy
        --min-dma-transfer=<int>                             - Minimum DMA transfer size supported by the target in bytes
        --skip-non-unit-stride-loops                         - Testing purposes: avoid non-unit stride loop choice depths for copy placement
        --slow-mem-space=<uint>                              - Slow memory space identifier for copy generation (default: 0)
        --tag-mem-space=<uint>                               - Tag memory space identifier for copy generation (default: 0)
      --affine-expand-index-ops                              -   Lower affine operations operating on indices into more fundamental operations
      --affine-expand-index-ops-as-affine                    -   Lower affine operations operating on indices into affine.apply operations
      --affine-loop-coalescing                               -   Coalesce nested loops with independent bounds into a single loop
      --affine-loop-fusion                                   -   Fuse affine loop nests
        --compute-tolerance=<number>                         - Fractional increase in additional computation tolerated while fusing
        --fast-mem-space=<uint>                              - Faster memory space number to promote fusion buffers to
        --local-buf-threshold=<ulong>                        - Threshold size (KiB) for promoting local buffers to fast memory space
        --maximal                                            - Enables maximal loop fusion
        --mode=<value>                                       - fusion mode to attempt
    =greedy                                            -   Perform greedy (both producer-consumer and sibling)  fusion
    =producer                                          -   Perform only producer-consumer fusion
    =sibling                                           -   Perform only sibling fusion
      --affine-loop-invariant-code-motion                    -   Hoist loop invariant instructions outside of affine loops
      --affine-loop-normalize                                -   Apply normalization transformations to affine loop-like ops
        --promote-single-iter                                - Promote single iteration loops
      --affine-loop-tile                                     -   Tile affine loop nests
        --cache-size=<ulong>                                 - Set size of cache to tile for in KiB (default: 512)
        --separate                                           - Separate full and partial tiles (default: false)
        --tile-size=<uint>                                   - Use this tile size for all loops
        --tile-sizes=<uint>                                  - List of tile sizes for each perfect nest (overridden by -tile-size)
      --affine-loop-unroll                                   -   Unroll affine loops
        --cleanup-unroll                                     - Fully unroll the cleanup loop when possible.
        --unroll-factor=<uint>                               - Use this unroll factor for all loops being unrolled
        --unroll-full                                        - Fully unroll loops
        --unroll-full-threshold=<uint>                       - Unroll all loops with trip count less than or equal to this
        --unroll-num-reps=<uint>                             - Unroll innermost loops repeatedly this many times
        --unroll-up-to-factor                                - Allow unrolling up to the factor specified
      --affine-loop-unroll-jam                               -   Unroll and jam affine loops
        --unroll-jam-factor=<uint>                           - Use this unroll jam factor for all loops (default 4)
      --affine-parallelize                                   -   Convert affine.for ops into 1-D affine.parallel
        --max-nested=<uint>                                  - Maximum number of nested parallel loops to produce. Defaults to unlimited (UINT_MAX).
        --parallel-reductions                                - Whether to parallelize reduction loops. Defaults to false.
      --affine-pipeline-data-transfer                        -   Pipeline non-blocking data transfers between explicitly managed levels of the memory hierarchy
      --affine-raise-from-memref                             -   Turn some memref operators to affine operators where supported
      --affine-scalrep                                       -   Replace affine memref accesses by scalars by forwarding stores to loads and eliminating redundant loads
      --affine-simplify-structures                           -   Simplify affine expressions in maps/sets and normalize memrefs
      --affine-super-vectorize                               -   Vectorize to a target independent n-D vector abstraction
        --test-fastest-varying=<long>                        - Specify a 1-D, 2-D or 3-D pattern of fastest varying memory dimensions to match. See defaultPatterns in Vectorize.cpp for a description and examples. This is used for testing purposes
        --vectorize-reductions                               - Vectorize known reductions expressed via iter_args. Switched off by default.
        --virtual-vector-size=<long>                         - Specify an n-D virtual vector size for vectorization. This must be greater than zero.
      --affine-super-vectorizer-test                         -   Tests vectorizer standalone functionality.
        --backward-slicing                                   - Enable testing backward static slicing and topological sort functionalities
        --compose-maps                                       - Enable testing the composition of AffineMap where each AffineMap in the composition is specified as the affine_map attribute in a constant op.
        --forward-slicing                                    - Enable testing forward static slicing and topological sort functionalities
        --slicing                                            - Enable testing static slicing and topological sort functionalities
        --vector-shape-ratio=<int>                           - Specify the HW vector size for vectorization
        --vectorize-affine-loop-nest                         - Enable testing for the 'vectorizeAffineLoopNest' utility by vectorizing the outermost loops found
      --amdgpu-emulate-atomics                               -   Emulate atomic operations on chipsets that do not support them
        --chipset=<string>                                   - Chipset that these operations will run on
      --amdgpu-resolve-strided-metadata                      -   Resolve memref.extract_strided_metadata on AMDGPU ops
      --amdgpu-transfer-read-to-load                         -   Lower the operations from the vector transfer_read to vector load
      --arith-emulate-unsupported-floats                     -   Emulate operations on unsupported floats with extf/truncf
        --source-types=<string>                              - MLIR types without arithmetic support on a given target
        --target-type=<string>                               - MLIR type to convert the unsupported source types to
      --arith-emulate-wide-int                               -   Emulate 2*N-bit integer operations using N-bit operations
        --widest-int-supported=<uint>                        - Widest integer type supported by the target
      --arith-expand                                         -   Legalize Arith ops to be convertible to LLVM.
        --include-bf16                                       - Enable the BF16 expansion patterns
        --include-f8e8m0                                     - Enable the F8E8M0 expansion patterns
      --arith-int-range-narrowing                            -   Reduce integer operations bitwidth based on integer range analysis
        --int-bitwidths-supported=<uint>                     - Integer bitwidths supported
      --arith-unsigned-when-equivalent                       -   Replace signed ops with unsigned ones where they are proven equivalent
      --arm-neon-2d-to-intr                                  -   Convert Arm NEON structured ops to intrinsics
      --arm-sme-outer-product-fusion                         -   Fuse 'arm_sme.outerproduct' operations into 2-way or 4-way widening variants
      --arm-sme-vector-legalization                          -   Legalize vectors for ArmSME
      --arm-sve-legalize-vector-storage                      -   Ensures stores of SVE vector types will be legal
      --async-func-to-async-runtime                          -   Lower async.func operations to the explicit async.runtime andasync.coro operations
      --async-parallel-for                                   -   Convert scf.parallel operations to multiple async compute ops executed concurrently for non-overlapping iteration ranges
        --async-dispatch                                     - Dispatch async compute tasks using recursive work splitting. If `false` async compute tasks will be launched using simple for loop in the caller thread.
        --min-task-size=<int>                                - The minimum task size for sharding parallel operation.
        --num-workers=<int>                                  - The number of available workers to execute async operations. If `-1` the value will be retrieved from the runtime.
      --async-runtime-policy-based-ref-counting              -   Policy based reference counting for Async runtime operations
      --async-runtime-ref-counting                           -   Automatic reference counting for Async runtime operations
      --async-runtime-ref-counting-opt                       -   Optimize automatic reference counting operations for theAsync runtime by removing redundant operations
      --async-to-async-runtime                               -   Lower all high level async operations (e.g. async.execute) tothe explicit async.runtime and async.coro operations
      --buffer-deallocation-simplification                   -   Optimizes `bufferization.dealloc` operation for more efficient codegen
      --buffer-hoisting                                      -   Optimizes placement of allocation operations by moving them into common dominators and out of nested regions
      --buffer-loop-hoisting                                 -   Optimizes placement of allocation operations by moving them out of loop nests
      --buffer-results-to-out-params                         -   Converts memref-typed function results to out-params
        --add-result-attr                                    - Add the attribute 'bufferize.result' to all output parameters.
        --hoist-static-allocs                                - Hoist static allocations to call sites.
      --bufferization-lower-deallocations                    -   Lowers `bufferization.dealloc` operations to `memref.dealloc`operations
      --canonicalize                                         -   Canonicalize operations
        --disable-patterns=<string>                          - Labels of patterns that should be filtered out during application
        --enable-patterns=<string>                           - Labels of patterns that should be used during application, all other patterns are filtered out
        --max-iterations=<long>                              - Max. iterations between applying patterns / simplifying regions
        --max-num-rewrites=<long>                            - Max. number of pattern rewrites within an iteration
        --region-simplify=<value>                            - Perform control flow optimizations to the region tree
    =disabled                                          -   Don't run any control-flow simplification.
    =normal                                            -   Perform simple control-flow simplifications (e.g. dead args elimination).
    =aggressive                                        -   Perform aggressive control-flow simplification (e.g. block merging).
        --test-convergence                                   - Test only: Fail pass on non-convergence to detect cyclic pattern
        --top-down                                           - Seed the worklist in general top-down order
      --composite-fixed-point-pass                           -   Composite fixed point pass
        --max-iterations=<int>                               - Maximum number of iterations if inner pipeline
        --name=<string>                                      - Composite pass display name
        --pipeline=<string>                                  - Composite pass inner pipeline
      --control-flow-sink                                    -   Sink operations into conditional blocks
      --convert-affine-for-to-gpu                            -   Convert top-level AffineFor Ops to GPU kernels
        --gpu-block-dims=<uint>                              - Number of GPU block dimensions for mapping
        --gpu-thread-dims=<uint>                             - Number of GPU thread dimensions for mapping
      --convert-amdgpu-to-rocdl                              -   Convert AMDGPU dialect to ROCDL dialect
        --chipset=<string>                                   - Chipset that these operations will run on
      --convert-arith-to-amdgpu                              -   Convert Arith operations to AMDGPU-specific implementations
        --allow-packed-f16-round-to-zero                     - Whether we should allow f32->f16 packed round-to-zero conversion
        --chipset=<string>                                   - Chipset that these operations will run on
        --saturate-fp8-truncf                                - Use saturating truncation for 8-bit float types
      --convert-arith-to-arm-sme                             -   Convert Arith dialect to ArmSME dialect
      --convert-arith-to-emitc                               -   Convert Arith dialect to EmitC dialect
      --convert-arith-to-llvm                                -   Convert Arith dialect to LLVM dialect
        --index-bitwidth=<uint>                              - Bitwidth of the index type, 0 to use size of machine word
      --convert-arith-to-spirv                               -   Convert Arith dialect to SPIR-V dialect
        --emulate-lt-32-bit-scalar-types                     - Emulate narrower scalar types with 32-bit ones if not supported by the target
      --convert-arm-sme-to-llvm                              -   Lower the operations from the ArmSME dialect into the LLVM dialect
        --dump-tile-live-ranges                              - Dump the live ranges of SME tiles (for debugging)
      --convert-arm-sme-to-scf                               -   Lower the operations from the ArmSME dialect into the SCF dialect
      --convert-async-to-llvm                                -   Convert the operations from the async dialect into the LLVM dialect
      --convert-bufferization-to-memref                      -   Convert operations from the Bufferization dialect to the MemRef dialect
      --convert-cf-to-llvm                                   -   Convert ControlFlow operations to the LLVM dialect
        --index-bitwidth=<uint>                              - Bitwidth of the index type, 0 to use size of machine word
      --convert-cf-to-spirv                                  -   Convert ControlFlow dialect to SPIR-V dialect
        --emulate-lt-32-bit-scalar-types                     - Emulate narrower scalar types with 32-bit ones if not supported by the target
      --convert-complex-to-libm                              -   Convert Complex dialect to libm calls
      --convert-complex-to-llvm                              -   Convert Complex dialect to LLVM dialect
        --complex-range=<value>                              - Control the intermediate calculation of complex number division
    =improved                                          -   improved
    =basic                                             -   basic (default)
    =none                                              -   none
      --convert-complex-to-spirv                             -   Convert Complex dialect to SPIRV dialect
      --convert-complex-to-standard                          -   Convert Complex dialect to standard dialect
        --complex-range=<value>                              - Control the intermediate calculation of complex number division
    =improved                                          -   improved (default)
    =basic                                             -   basic
    =none                                              -   none
      --convert-elementwise-to-linalg                        -   Convert ElementwiseMappable ops to linalg
      --convert-func-to-emitc                                -   Convert Func dialect to EmitC dialect
      --convert-func-to-llvm                                 -   Convert from the Func dialect to the LLVM dialect
        --index-bitwidth=<uint>                              - Bitwidth of the index type, 0 to use size of machine word
        --use-bare-ptr-memref-call-conv                      - Replace FuncOp's MemRef arguments with bare pointers to the MemRef element types
      --convert-func-to-spirv                                -   Convert Func dialect to SPIR-V dialect
        --emulate-lt-32-bit-scalar-types                     - Emulate narrower scalar types with 32-bit ones if not supported by the target
      --convert-gpu-to-llvm-spv                              -   Generate LLVM operations to be ingested by a SPIR-V backend for gpu operations
        --use-64bit-index                                    - Use 64-bit integers to convert index types
      --convert-gpu-to-nvvm                                  -   Generate NVVM operations for gpu operations
        --allowed-dialects=<string>                          - Run conversion patterns of only the specified dialects
        --has-redux                                          - Target gpu supports redux
        --index-bitwidth=<uint>                              - Bitwidth of the index type, 0 to use size of machine word
        --use-bare-ptr-memref-call-conv                      - Replace memref arguments in GPU functions with bare pointers. All memrefs must have static shape.
      --convert-gpu-to-rocdl                                 -   Generate ROCDL operations for gpu operations
        --allowed-dialects=<string>                          - Run conversion patterns of only the specified dialects
        --chipset=<string>                                   - Chipset that these operations will run on
        --index-bitwidth=<uint>                              - Bitwidth of the index type, 0 to use size of machine word
        --runtime=<value>                                    - Runtime code will be run on (default is Unknown, can also use HIP or OpenCL)
    =unknown                                           -   Unknown (default)
    =HIP                                               -   HIP
    =OpenCL                                            -   OpenCL
        --use-bare-ptr-memref-call-conv                      - Replace memref arguments in GPU functions with bare pointers.All memrefs must have static shape
      --convert-gpu-to-spirv                                 -   Convert GPU dialect to SPIR-V dialect
        --use-64bit-index                                    - Use 64-bit integers to convert index types
      --convert-index-to-llvm                                -   Lower the `index` dialect to the `llvm` dialect.
        --index-bitwidth=<uint>                              - Bitwidth of the index type, 0 to use size of machine word
      --convert-index-to-spirv                               -   Lower the `index` dialect to the `spirv` dialect.
        --use-64bit-index                                    - Use 64-bit integers to convert index types
      --convert-linalg-to-affine-loops                       -   Lower the operations from the linalg dialect into affine loops
      --convert-linalg-to-loops                              -   Lower the operations from the linalg dialect into loops
      --convert-linalg-to-parallel-loops                     -   Lower the operations from the linalg dialect into parallel loops
      --convert-linalg-to-std                                -   Convert the operations from the linalg dialect into the Standard dialect
      --convert-math-to-emitc                                -   Convert some Math operations to EmitC call_opaque operations
        --language-target=<value>                            - Select the language standard target for callees (c99 or cpp11).
    =c99                                               -   c99
    =cpp11                                             -   cpp11
      --convert-math-to-funcs                                -   Convert Math operations to calls of outlined implementations.
        --convert-ctlz                                       - Convert math.ctlz to a software implementation. Enable for targets that do not natively support ctlz.
        --min-width-of-fpowi-exponent=<uint>                 - Convert FPowI only if the width of its exponent's integer type is greater than or equal to this value
      --convert-math-to-libm                                 -   Convert Math dialect to libm calls
      --convert-math-to-llvm                                 -   Convert Math dialect to LLVM dialect
        --approximate-log1p                                  - Enable approximation of Log1p.
      --convert-math-to-rocdl                                -   Convert Math dialect to ROCDL library calls
      --convert-math-to-spirv                                -   Convert Math dialect to SPIR-V dialect
      --convert-memref-to-emitc                              -   Convert MemRef dialect to EmitC dialect
      --convert-memref-to-spirv                              -   Convert MemRef dialect to SPIR-V dialect
        --bool-num-bits=<int>                                - The number of bits to store a boolean value
        --use-64bit-index                                    - Use 64-bit integers to convert index types
      --convert-mesh-to-mpi                                  -   Convert Mesh dialect to MPI dialect.
      --convert-nvgpu-to-nvvm                                -   Convert NVGPU dialect to NVVM dialect
      --convert-nvvm-to-llvm                                 -   Convert NVVM to PTX with Inline Assembly in LLVM dialect
      --convert-openacc-to-scf                               -   Convert the OpenACC ops to OpenACC with SCF dialect
      --convert-openmp-to-llvm                               -   Convert the OpenMP ops to OpenMP ops with LLVM dialect
      --convert-parallel-loops-to-gpu                        -   Convert mapped scf.parallel ops to gpu launch operations
      --convert-pdl-to-pdl-interp                            -   Convert PDL ops to PDL interpreter ops
      --convert-scf-to-cf                                    -   Convert SCF dialect to ControlFlow dialect, replacing structured control flow with a CFG
      --convert-scf-to-emitc                                 -   Convert SCF dialect to EmitC dialect, maintaining structured control flow
      --convert-scf-to-openmp                                -   Convert SCF parallel loop to OpenMP parallel + workshare constructs.
        --num-threads=<uint>                                 - Number of threads to use
      --convert-scf-to-spirv                                 -   Convert SCF dialect to SPIR-V dialect.
      --convert-shape-constraints                            -   Convert shape constraint operations to the standard dialect
      --convert-shape-to-std                                 -   Convert operations from the shape dialect into the standard dialect
      --convert-spirv-to-llvm                                -   Convert SPIR-V dialect to LLVM dialect
        --client-api=<value>                                 - Derive StorageClass to address space mapping from the client API
    =Unknown                                           -   Unknown (default)
    =Metal                                             -   Metal
    =OpenCL                                            -   OpenCL
    =Vulkan                                            -   Vulkan
    =WebGPU                                            -   WebGPU
      --convert-tensor-to-linalg                             -   Convert some Tensor dialect ops to Linalg dialect
      --convert-tensor-to-spirv                              -   Convert Tensor dialect to SPIR-V dialect
        --emulate-lt-32-bit-scalar-types                     - Emulate narrower scalar types with 32-bit ones if not supported by the target
      --convert-to-emitc                                     -   Convert to EmitC dialect via dialect interfaces
        --filter-dialects=<string>                           - Test conversion patterns of only the specified dialects
      --convert-to-llvm                                      -   Convert to LLVM via dialect interfaces found in the input IR
        --dynamic                                            - Use op conversion attributes to configure the conversion
        --filter-dialects=<string>                           - Test conversion patterns of only the specified dialects
      --convert-ub-to-llvm                                   -   Convert UB dialect to LLVM dialect
        --index-bitwidth=<uint>                              - Bitwidth of the index type, 0 to use size of machine word
      --convert-ub-to-spirv                                  -   Convert UB dialect to SPIR-V dialect
      --convert-vector-to-arm-sme                            -   Lower the operations from the vector dialect into the ArmSME dialect
      --convert-vector-to-gpu                                -   Lower the operations from the vector dialect into the GPU dialect
        --use-nvgpu                                          - convert to NvGPU ops instead of GPU dialect ops
      --convert-vector-to-llvm                               -   Lower the operations from the vector dialect into the LLVM dialect
        --enable-amx                                         - Enables the use of AMX dialect while lowering the vector dialect.
        --enable-arm-neon                                    - Enables the use of ArmNeon dialect while lowering the vector dialect.
        --enable-arm-sve                                     - Enables the use of ArmSVE dialect while lowering the vector dialect.
        --enable-x86vector                                   - Enables the use of X86Vector dialect while lowering the vector dialect.
        --force-32bit-vector-indices                         - Allows compiler to assume vector indices fit in 32-bit if that yields faster code
        --reassociate-fp-reductions                          - Allows llvm to reassociate floating-point reductions for speed
        --use-vector-alignment                               - Use the preferred alignment of a vector type in load/store operations instead of the alignment of the element type of the memref. This flag is intended for use with hardware which requiresvector alignment, or in application contexts where it is known all vector access are naturally aligned. 
        --vector-contract-lowering=<value>                   - control the lowering of `vector.contract` operations.
    =dot                                               -   Progressively lower to finer grained `vector.contract` and dot-products. (default)
    =matmul                                            -   Lower to `vector.matrix_multiply`, maps 1-1 to LLVM matrix intrinsics.
    =outerproduct                                      -   Lower to `vector.outerproduct`.
    =parallelarith                                     -   Lower contract with all reduction dimensions unrolled to 1 to a vector elementwise operations.
        --vector-transpose-lowering=<value>                  - control the lowering of `vector.transpose` operations.
    =eltwise                                           -   Lower transpose into element-wise extract and inserts (default)
    =flat                                              -   Lower 2-D transpose to `vector.flat_transpose`, maps 1-1 to LLVM matrix intrinsics
    =shuffle1d                                         -   Lower 2-D transpose to `vector.shuffle` on 1-D vector.
    =shuffle16x16                                      -   Lower 2-D transpose to `vector.shuffle` on 16x16 vector.
      --convert-vector-to-scf                                -   Lower the operations from the vector dialect into the SCF dialect
        --full-unroll                                        - Perform full unrolling when converting vector transfers to SCF
        --lower-scalable                                     - Add scalable vector specific lowerings (that introduce loops)
        --lower-tensors                                      - Lower transfer ops that operate on tensors
        --target-rank=<uint>                                 - Target vector rank to which transfer ops should be lowered
      --convert-vector-to-spirv                              -   Convert Vector dialect to SPIR-V dialect
      --convert-vector-to-xegpu                              -   Lower the operations from the vector dialect into the XeGPU dialect
      --cse                                                  -   Eliminate common sub-expressions
      --decorate-spirv-composite-type-layout                 -   Decorate SPIR-V composite type with layout info
      --drop-equivalent-buffer-results                       -   Remove MemRef return values that are equivalent to a bbArg
      --duplicate-function-elimination                       -   Deduplicate functions
      --eliminate-empty-tensors                              -   Try to eliminate all tensor.empty ops.
      --empty-tensor-to-alloc-tensor                         -   Replace all empty ops by alloc_tensor ops.
      --enable-arm-streaming                                 -   Enable Armv9 Streaming SVE mode
        --if-required-by-ops                                 - Only apply the selected streaming/ZA modes if the function contains ops that implement the ArmSMETileOpInterface.
        --if-scalable-and-supported                          - Only apply the selected streaming/ZA modes if the function contains supported scalable vector operations.
        --streaming-mode=<value>                             - Select how streaming-mode is managed at the function-level.
    =disabled                                          -   Streaming mode is disabled.
    =streaming                                         -   Streaming mode is part of the function interface (ABI), caller manages PSTATE.SM on entry/exit.
    =streaming-locally                                 -   Streaming mode is internal to the function, callee manages PSTATE.SM on entry/exit.
    =streaming-compatible                              -   Function supports both streaming and non-streaming modes.
        --za-mode=<value>                                    - Select how ZA-storage is managed at the function-level.
    =disabled                                          -   ZA storage is disabled.
    =new-za                                            -   The function has ZA state. The ZA state is created on entry and destroyed on exit.
    =in-za                                             -   The function uses ZA state. The ZA state may be used for input.
    =out-za                                            -   The function uses ZA state. The ZA state may be used for output.
    =inout-za                                          -   The function uses ZA state. The ZA state may be used for input and/or output.
    =preserves-za                                      -   The function shares ZA state. The ZA state may not be used for input and/or output and the function must return with ZA unchanged
      --ensure-debug-info-scope-on-llvm-func                 -   Materialize LLVM debug info subprogram attribute on every LLVMFuncOp
        --emission-kind=<value>                              - Emission kind to generate debug info.
    =None                                              -   None
    =Full                                              -   Full
    =LineTablesOnly                                    -   LineTablesOnly (default)
    =DebugDirectivesOnly                               -   DebugDirectivesOnly
      --expand-realloc                                       -   Expand memref.realloc operations into its components
        --emit-deallocs                                      - Emit deallocation operations for the original MemRef
      --expand-strided-metadata                              -   Expand memref operations into easier to analyze constructs
      --finalize-memref-to-llvm                              -   Finalize MemRef dialect to LLVM dialect conversion
        --index-bitwidth=<uint>                              - Bitwidth of the index type, 0 to use size of machine word
        --use-aligned-alloc                                  - Use aligned_alloc in place of malloc for heap allocations
        --use-generic-functions                              - Use generic allocation and deallocation functions instead of the classic 'malloc', 'aligned_alloc' and 'free' functions
      --flatten-memref                                       -   Flatten a multiple dimensional memref to 1-dimensional
      --fold-memref-alias-ops                                -   Fold memref alias ops into consumer load/store ops
      --fold-tensor-subset-ops                               -   Fold tensor subset ops into producer/consumer ops
      --form-expressions                                     -   Form C-style expressions from C-operator ops
      --generate-runtime-verification                        -   Generate additional runtime op verification checks
      --gpu-async-region                                     -   Make GPU ops async
      --gpu-decompose-memrefs                                -   Decomposes memref index computation into explicit ops.
      --gpu-eliminate-barriers                               -   Erase unnecessary barriers
      --gpu-kernel-outlining                                 -   Outline gpu.launch bodies to kernel functions
        --data-layout-str=<string>                           - String description of the data layout
      --gpu-launch-sink-index-computations                   -   Sink index computations into gpu.launch body
      --gpu-map-parallel-loops                               -   Greedily maps loops to GPU hardware dimensions.
      --gpu-module-to-binary                                 -   Transforms a GPU module into a GPU binary.
        --format=<string>                                    - The target representation of the compilation process.
        -l <string>                                          - Extra files to link to.
        --opts=<string>                                      - Command line options to pass to the tools.
        --section=<string>                                   - ELF section where binary is to be located.
        --toolkit=<string>                                   - Toolkit path.
      --gpu-to-llvm                                          -   Convert GPU dialect to LLVM dialect with GPU runtime calls
        --intersperse-sizes-for-kernels                      - Inserts a size_t argument following each memref argument, containing the static size in bytes of the buffer. Incompatible arguments are rejected. This is intended for use by the Vulkan runtime with the kernel bare pointer calling convention, to enable dynamic binding of buffers as arguments without static type info.
        --use-bare-pointers-for-host                         - Use bare pointers to pass memref arguments to host functions. All memrefs must have static shape.
        --use-bare-pointers-for-kernels                      - Use bare pointers to pass memref arguments to kernels. The kernel must use the same setting for this option.
      --inline                                               -   Inline function calls
        --default-pipeline=<string>                          - The optimizer pipeline used for callables that do not have a dedicated optimizer pipeline in opPipelineList
        --inlining-threshold=<uint>                          - If the ratio between the number of the operations in the callee and the number of the operations in the caller exceeds this value (in percentage), then the callee is not inlined even if it is legal to inline it
        --max-iterations=<uint>                              - Maximum number of iterations when inlining within an SCC
        --op-pipelines=<pass-manager>                        - Callable operation specific optimizer pipelines (in the form of `dialect.op(pipeline)`)
      --int-range-optimizations                              -   Do optimizations based on integer range analysis
      --lift-cf-to-scf                                       -   Lift ControlFlow dialect to SCF dialect
      --linalg-block-pack-matmul                             -   Convert linalg matmul ops to block layout and back
        --allow-padding                                      - Allow packing padding
        --block-factors=<long>                               - Block factors (mb, nb, kb) for relayout
        --lhs-transpose-inner-blocks                         - Transpose LHS inner block layout [mb][kb] -> [kb][mb]
        --lhs-transpose-outer-blocks                         - Transpose LHS outer block layout [MB][KB] -> [KB][MB]
        --mnk-order=<long>                                   - Permutation of matmul (M, N, K) dimensions order
        --mnk-padded-multiples=<long>                        - Next multiples of the packing sizes
        --rhs-transpose-inner-blocks                         - Transpose RHS inner block layout [kb][nb] -> [nb][kb]
        --rhs-transpose-outer-blocks                         - Transpose RHS outer block layout [KB][NB] -> [NB][KB]
      --linalg-detensorize                                   -   Detensorize linalg ops
        --aggressive-mode                                    - Detensorize all ops that qualify for detensoring along with branch operands and basic-block arguments.
      --linalg-fold-into-elementwise                         -   Fold transform, broadcast and other ops into elementwise
      --linalg-fold-unit-extent-dims                         -   Remove unit-extent dimension in Linalg ops on tensors
        --use-rank-reducing-slices                           - Generate rank-reducing slices instead of reassociative reshapes
      --linalg-fuse-elementwise-ops                          -   Fuse elementwise operations on tensors
      --linalg-generalize-named-ops                          -   Convert named ops into generic ops
      --linalg-inline-scalar-operands                        -   Inline scalar operands into linalg generic ops
      --linalg-named-op-conversion                           -   Convert from one named linalg op to another.
      --linalg-specialize-generic-ops                        -   Convert generic ops back to named ops
      --llvm-add-comdats                                     -   Add comdats to linkonce and linkonce_odr functions
      --llvm-legalize-for-export                             -   Legalize LLVM dialect to be convertible to LLVM IR
      --llvm-optimize-for-nvvm-target                        -   Optimize NVVM IR
      --llvm-request-c-wrappers                              -   Request C wrapper emission for all functions
      --loop-invariant-code-motion                           -   Hoist loop invariant instructions outside of the loop
      --loop-invariant-subset-hoisting                       -   Hoist loop invariant subset ops outside of the loop
      --lower-affine                                         -   Lower Affine operations to a combination of Arith and SCF operations
      --lower-host-to-llvm                                   -   Lowers the host module code and `gpu.launch_func` to LLVM
      --lower-quant-ops                                      -   Lower quant.dcast and quant.qcast ops
      --lower-sparse-foreach-to-scf                          -   Decompose a complex sparse operation into multiple stages
      --lower-sparse-iteration-to-scf                        -   lower sparse_tensor.iterate/coiterate into scf loops
      --lower-sparse-ops-to-foreach                          -   Applies sparse tensor rewriting rules after sparsification
        --enable-convert                                     - Enable rewriting rules for the convert operator
        --enable-runtime-library                             - Enable runtime library for manipulating sparse tensors
      --lower-vector-mask                                    -   Lower 'vector.mask' operations
      --lower-vector-multi-reduction                         -   Lower 'vector.multi_reduction' operations
        --lowering-strategy=<value>                          - Select the strategy to control how multi_reduction is lowered.
    =inner-parallel                                    -   Lower multi_reduction into outer-reduction and inner-parallel ops.
    =inner-reduction                                   -   Lower multi_reduction into outer-parallel and inner-reduction ops.
      --map-memref-spirv-storage-class                       -   Map numeric MemRef memory spaces to SPIR-V storage classes
        --client-api=<string>                                - The client API to use for populating mappings
      --math-extend-to-supported-types                       -   Legalize floating-point math ops on low-precision floats
        --extra-types=<string>                               - MLIR types with arithmetic support on a given target (f64 and f32 are implicitly supported)
        --target-type=<string>                               - MLIR type to convert the unsupported source types to
      --math-uplift-to-fma                                   -   Uplift arith ops to math.fma.
      --mem2reg                                              -   Promotes memory slots into values.
        --region-simplify                                    - Perform control flow optimizations to the region tree
      --memref-emulate-wide-int                              -   Emulate 2*N-bit integer operations using N-bit operations
        --widest-int-supported=<uint>                        - Widest integer type supported by the target
      --memref-expand                                        -   Legalize memref operations to be convertible to LLVM.
      --mesh-spmdization                                     -   Partition a function into SPMD form.
      --mlprogram-pipeline-globals                           -   Optimize `ml_program` global operations for read and store
      --normalize-memrefs                                    -   Normalize memrefs
      --normalize-quant-types                                -   Normalize generic quantized types to specific quantized types
      --nvgpu-optimize-shared-memory                         -   Optimizes accesses to shard memory memrefs in order to reduce bank conflicts.
      --nvvm-attach-target                                   -   Attaches an NVVM target attribute to a GPU Module.
        -O <uint>                                            - Optimization level.
        --chip=<string>                                      - Target chip.
        --fast                                               - Enable fast math mode.
        --features=<string>                                  - Target features.
        --ftz                                                - Enable flush to zero for denormals.
        -l <string>                                          - Extra bitcode libraries paths to link to.
        --module=<string>                                    - Regex used to identify the modules to attach the target to.
        --ptxas-cmd-options=<string>                         - Command line options passed to downstream compiler
        --triple=<string>                                    - Target triple.
      --one-shot-bufferize                                   -   One-Shot Bufferize
        --allow-return-allocs-from-loops                     - Allows returning/yielding new allocations from a loop.
        --allow-unknown-ops                                  - Allows unknown (not bufferizable) ops in the input IR.
        --analysis-fuzzer-seed=<uint>                        - Test only: Analyze ops in random order with a given seed (fuzzer)
        --analysis-heuristic=<string>                        - Heuristic that control the IR traversal during analysis
        --buffer-alignment=<ulong>                           - Sets the alignment of newly allocated buffers.
        --bufferize-function-boundaries                      - Bufferize function boundaries (experimental).
        --check-parallel-regions                             - Account for parallel regions in RaW analysis.
        --copy-before-write                                  - Skip the analysis. Make a buffer copy on every write.
        --dialect-filter=<string>                            - Restrict bufferization to ops from these dialects.
        --dump-alias-sets                                    - Test only: Annotate tensor IR with alias sets
        --function-boundary-type-conversion=<value>          - Controls layout maps when bufferizing function signatures.
    =infer-layout-map
    =identity-layout-map
    =fully-dynamic-layout-map
        --must-infer-memory-space                            - The memory space of an memref types must always be inferred. If unset, a default memory space of 0 is used otherwise.
        --no-analysis-func-filter=<string>                   - Skip analysis of functions with these symbol names.Set copyBeforeWrite to true when bufferizing them.
        --print-conflicts                                    - Test only: Annotate IR with RaW conflicts. Requires test-analysis-only.
        --test-analysis-only                                 - Test only: Only run inplaceability analysis and annotate IR
        --unknown-type-conversion=<value>                    - Controls layout maps for non-inferrable memref types.
    =infer-layout-map
    =identity-layout-map
    =fully-dynamic-layout-map
        --use-encoding-for-memory-space                      - Use the Tensor encoding attribute for the memory space. Exclusive to the 'must-infer-memory-space' option
      --openacc-legalize-data-values                         -   Legalizes SSA values in compute regions with results from data clause operations
        --apply-to-acc-data-construct                        - Replaces varPtr uses with accPtr for acc compute regions contained within acc.data or acc.declare region.
        --host-to-device                                     - Replace varPtr uses with accPtr if true. Replace accPtr uses with varPtr if false
      --optimize-allocation-liveness                         -   This pass optimizes the liveness of temp allocations in the input function
      --outline-shape-computation                            -   Using shape.func to preserve shape computation
      --ownership-based-buffer-deallocation                  -   Adds all required dealloc operations for all allocations in the input program
        --private-function-dynamic-ownership                 - Allows to add additional arguments to private functions to dynamically pass ownership of memrefs to callees. This can enable earlier deallocations.
      --pre-sparsification-rewrite                           -   Applies sparse tensor rewriting rules prior to sparsification
      --print-ir                                             -   Print IR on the debug stream
        --label=<string>                                     - Label
      --print-op-stats                                       -   Print statistics of operations
        --json                                               - print the stats as JSON
      --promote-buffers-to-stack                             -   Promotes heap-based allocations to automatically managed stack-based allocations
        --max-alloc-size-in-bytes=<uint>                     - Maximal size in bytes to promote allocations to stack.
        --max-rank-of-allocated-memref=<uint>                - Maximal memref rank to promote dynamic buffers.
      --reconcile-unrealized-casts                           -   Simplify and eliminate unrealized conversion casts
      --remove-dead-values                                   -   Remove dead values
      --remove-shape-constraints                             -   Replace all cstr_ ops with a true witness
      --resolve-ranked-shaped-type-result-dims               -   Resolve memref.dim of result values of ranked shape type
      --resolve-shaped-type-result-dims                      -   Resolve memref.dim of result values
      --rocdl-attach-target                                  -   Attaches a ROCDL target attribute to a GPU Module.
        -O <uint>                                            - Optimization level.
        --abi=<string>                                       - ABI version.
        --chip=<string>                                      - Target chip.
        --correct-sqrt                                       - Enable correct rounded sqrt.
        --daz                                                - Enable denormals are zero opt.
        --fast                                               - Enable fast relaxed math opt.
        --features=<string>                                  - Target features.
        --finite-only                                        - Enable finite only opt.
        -l <string>                                          - Extra bitcode libraries paths to link to.
        --module=<string>                                    - Regex used to identify the modules to attach the target to.
        --triple=<string>                                    - Target triple.
        --unsafe-math                                        - Enable unsafe math opt.
        --wave64                                             - Use Wave64 mode.
      --sccp                                                 -   Sparse Conditional Constant Propagation
      --scf-for-loop-canonicalization                        -   Canonicalize operations within scf.for loop bodies
      --scf-for-loop-peeling                                 -   Peel `for` loops at their upper bounds.
        --peel-front                                         - Peel the first iteration out of the loop.
        --skip-partial                                       - Do not peel loops inside of the last, partial iteration of another already peeled loop.
      --scf-for-loop-range-folding                           -   Fold add/mul ops into loop range
      --scf-for-loop-specialization                          -   Specialize `for` loops for vectorization
      --scf-for-to-while                                     -   Convert SCF for loops to SCF while loops
      --scf-forall-to-for                                    -   Convert SCF forall loops to SCF for loops
      --scf-forall-to-parallel                               -   Convert SCF forall loops to SCF parallel loops
      --scf-parallel-loop-fusion                             -   Fuse adjacent parallel loops
      --scf-parallel-loop-specialization                     -   Specialize parallel loops for vectorization
      --scf-parallel-loop-tiling                             -   Tile parallel loops
        --no-min-max-bounds                                  - Perform tiling with fixed upper bound with inbound check inside the internal loops
        --parallel-loop-tile-sizes=<long>                    - Factors to tile parallel loops by
      --set-llvm-module-datalayout                           -   Attach a datalayout string as a module attribute
        --data-layout=<string>                               - String description (LLVM format) of the data layout that is expected on the produced module
      --shape-to-shape-lowering                              -   Legalize Shape dialect to be convertible to Arith
      --sharding-propagation                                 -   sharding propagation
      --slice-analysis-test                                  -   Test Slice analysis functionality.
        --omit-block-arguments                               - Test Slice analysis with multiple blocks but slice omiting block arguments
      --snapshot-op-locations                                -   Generate new locations from the current IR
        --filename=<string>                                  - The filename to print the generated IR
        --pretty-debuginfo                                   - Print pretty debug info in MLIR output
        --print-debuginfo                                    - Print debug info in MLIR output
        --print-local-scope                                  - Print with local scope and inline information (eliding aliases for attributes, types, and locations
        --print-op-generic                                   - Print the generic op form
        --tag=<string>                                       - A tag to use when fusing the new locations with the original. If unset, the locations are replaced.
      --sparse-assembler                                     -   Add [dis]assemble operations on external sparse tensors
        --direct-out                                         - Directly returns buffers externally
      --sparse-buffer-rewrite                                -   Rewrite sparse primitives on buffers to actual code
        --enable-buffer-initialization                       - Enable zero-initialization of the memory buffers
      --sparse-gpu-codegen                                   -   Generates GPU code during sparsification
        --enable-runtime-library                             - Enable runtime library for manipulating sparse tensors
        --num-threads=<int>                                  - Sets the number of GPU threads
      --sparse-reinterpret-map                               -   Reinterprets sparse tensor type mappings
        --scope=<value>                                      - Set the reiterpretation scope
    =all                                               -   Run on every applicable operations.
    =only-generic                                      -   Run only on linalg.generic operations.
    =except-generic                                    -   Run on operations expect linalg.generic (e.g., foreach)
      --sparse-space-collapse                                -   sparse space collapsing pass
      --sparse-storage-specifier-to-llvm                     -   Lower sparse storage specifer to llvm structure
      --sparse-tensor-codegen                                -   Convert sparse tensors and primitives to actual code
        --create-sparse-deallocs                             - Specify if the temporary buffers created by the sparse compiler should be deallocated. For compatibility with core bufferization passes. This option is only used when enable-runtime-library=false. See also create-deallocs for BufferizationOption.
        --enable-buffer-initialization                       - Enable zero-initialization of the memory buffers
      --sparse-tensor-conversion                             -   Convert sparse tensors and primitives to library calls
      --sparse-vectorization                                 -   Vectorizes loops after sparsification
        --enable-simd-index32                                - Enable i32 indexing into vectors (for efficient gather/scatter)
        --enable-vla-vectorization                           - Enable vector length agnostic vectorization
        --vl=<int>                                           - Set the vector length (use 0 to disable vectorization)
      --sparsification                                       -   Automatically generate sparse tensor code from sparse tensor types
        --enable-runtime-library                             - Enable runtime library for manipulating sparse tensors
        --parallelization-strategy=<value>                   - Set the parallelization strategy
    =none                                              -   Turn off sparse parallelization.
    =dense-outer-loop                                  -   Enable dense outer loop sparse parallelization.
    =any-storage-outer-loop                            -   Enable sparse parallelization regardless of storage for the outer loop.
    =dense-any-loop                                    -   Enable dense parallelization for any loop.
    =any-storage-any-loop                              -   Enable sparse parallelization for any storage and loop.
        --sparse-emit-strategy=<value>                       - Emit functional code or interfaces (to debug) for sparse loops
    =functional                                        -   Emit functional code (with scf.for/while).
    =sparse-iterator                                   -   Emit (experimental) loops (with sparse.iterate).
    =debug-interface                                   -   Emit non-functional but easy-to-read interfaces to debug.
      --sparsification-and-bufferization                     -   Mini-pipeline that combines bufferization and sparsifiation
        --enable-gpu-libgen                                  - Enable GPU acceleration by means of direct library calls
        --enable-simd-index32                                - Enable i32 indexing into vectors (for efficient gather/scatter)
        --enable-vla-vectorization                           - Enable vector length agnostic vectorization
        --parallelization-strategy=<value>                   - Set the parallelization strategy
    =none                                              -   Turn off sparse parallelization.
    =dense-outer-loop                                  -   Enable dense outer loop sparse parallelization.
    =any-storage-outer-loop                            -   Enable sparse parallelization regardless of storage for the outer loop.
    =dense-any-loop                                    -   Enable dense parallelization for any loop.
    =any-storage-any-loop                              -   Enable sparse parallelization for any storage and loop.
        --sparse-emit-strategy=<value>                       - Emit functional code or interfaces (to debug) for sparse loops
    =functional                                        -   Emit functional code (with scf.for/while).
    =sparse-iterator                                   -   Emit (experimental) loops (with sparse.iterate).
    =debug-interface                                   -   Emit non-functional but easy-to-read interfaces to debug.
        --vl=<int>                                           - Set the vector length (use 0 to disable vectorization)
      --spirv-attach-target                                  -   Attaches an SPIR-V target attribute to a GPU Module.
        --caps=<string>                                      - List of supported SPIR-V Capabilities
        --client_api=<string>                                - Client API
        --device_id=<uint>                                   - Device ID
        --device_type=<string>                               - Device Type
        --exts=<string>                                      - List of supported SPIR-V Extensions
        --module=<string>                                    - Regex used to identify the modules to attach the target to.
        --vendor=<string>                                    - Device Vendor
        --ver=<string>                                       - SPIR-V Version.
      --spirv-canonicalize-gl                                -   Canonicalize GLSL ops
      --spirv-lower-abi-attrs                                -   Decorate SPIR-V composite type with layout info
      --spirv-rewrite-inserts                                -   Rewrite sequential chains of `spirv.CompositeInsert` operations into `spirv.CompositeConstruct` operations
      --spirv-unify-aliased-resource                         -   Unify access of multiple aliased resources into access of one single resource
      --spirv-update-vce                                     -   Deduce and attach minimal (version, capabilities, extensions) requirements to spirv.module ops
      --spirv-webgpu-prepare                                 -   Prepare SPIR-V to target WebGPU by expanding unsupported ops and replacing with supported ones
      --sroa                                                 -   Scalar Replacement of Aggregates
      --stage-sparse-ops                                     -   Decompose a complex sparse operation into multiple stages
      --strip-debuginfo                                      -   Strip debug info from all operations
      --strip-func-quant-types                               -   Strip quantized types from function headers
      --symbol-dce                                           -   Eliminate dead symbols
      --symbol-privatize                                     -   Mark symbols private
        --exclude=<string>                                   - Comma separated list of symbols that should not be marked private
      --test-affine-access-analysis                          -   Tests affine memory access analysis utility
      --test-affine-data-copy                                -   Tests affine data copy utility functions.
        --capacity-kib=<ulong>                               - Test copy generation enforcing a limit of capacity (default: unlimited)
        --for-memref-region                                  - Test copy generation for a single memref region
        --memref-filter                                      - Enable memref filter testing in affine data copy optimization
      --test-affine-loop-unswitch                            -   Tests affine loop unswitching / if/else hoisting
      --test-affine-parametric-tile                          -   Tile affine loops using SSA values as tile sizes
      --test-affine-reify-value-bounds                       -   Tests ValueBoundsOpInterface with affine dialect reification
        --reify-to-func-args                                 - Reify in terms of function args
        --use-arith-ops                                      - Reify with arith dialect ops
      --test-affine-walk                                     -   Test affine walk method.
      --test-alias-analysis                                  -   Test alias analysis results.
      --test-alias-analysis-extending                        -   Test alias analysis extending.
      --test-alias-analysis-modref                           -   Test alias analysis ModRef results.
      --test-arith-emulate-wide-int                          -   Function pass to test Wide Integer Emulation
        --function-prefix=<string>                           - Prefix of functions to run the emulation pass on
        --widest-int-supported=<uint>                        - Maximum integer bit width supported by the target
      --test-arm-sme-tile-allocation                         -   Tests SME 'virtual tile' allocation
        --dump-tile-live-ranges                              - Dump the live ranges of SME tiles (for debugging)
        --preprocess-only                                    - Only preprocess IR so it is ready for tile allocation (but do not allocate any tiles)
      --test-bit-width-constrained-vector-linearize          -   Linearizes ND vectors for N >= 2 into 1D vectors, with constraints in inner-most dimension's bit width.
        --target-vector-bitwidth=<uint>                      - Minimum vector bitwidth to enable the flattening transformation
      --test-block-is-in-loop                                -   Test mlir::blockIsInLoop()
      --test-bytecode-roundtrip                              -   Test pass to implement bytecode roundtrip tests.
        --test-dialect-version=<value>                       - Specifies the test dialect version to emit and parse
        --test-kind=<int>                                    - Specifies the test kind to execute
      --test-cf-assert                                       -   Function pass to test cf.assert lowering to LLVM without abort
      --test-cfg-loop-info                                   -   Test the loop info analysis.
      --test-clone                                           -   Test clone of op
      --test-commutativity-utils                             -   Test the functionality of the commutativity utility
      --test-compose-subview                                 -   Test combining composed subviews
      --test-constant-fold                                   -   Test operation constant folding
      --test-control-flow-sink                               -   Test control-flow sink pass
      --test-convert-call-op                                 -   Tests conversion of `func.call` to `llvm.call` in presence of custom types
      --test-convert-func-op                                 -   Tests conversion of `func.func` to `llvm.func` for different attributes
      --test-convert-to-spirv                                -   Conversion to SPIR-V pass only used for internal tests.
        --convert-gpu-modules                                - Clone and convert GPU modules
        --nest-in-gpu-module                                 - Put converted SPIR-V module inside the gpu.module instead of alongside it.
        --run-signature-conversion                           - Run function signature conversion to convert vector types
        --run-vector-unrolling                               - Run vector unrolling to convert vector types in function bodies
      --test-create-vector-broadcast                         -   Test optimization transformations for transfer ops
      --test-data-layout-query                               -   Test data layout queries
      --test-dead-code-analysis                              -   
      --test-decompose-affine-ops                            -   Tests affine ops decomposition utility functions.
      --test-decompose-call-graph-types                      -   Decomposes types at call graph boundaries.
      --test-derived-attr                                    -   Run test derived attributes
      --test-diagnostic-filter                               -   Test diagnostic filtering support.
        --filters=<string>                                   - Specifies the diagnostic file name filters.
      --test-diagnostic-metadata                             -   Test diagnostic metadata.
      --test-dialect-conversion-pdll                         -   Test DialectConversion PDLL functionality
      --test-distinct-attrs                                  -   Test parallel creation of distinct attributes
      --test-dynamic-pipeline                                -   Tests the dynamic pipeline feature by applying a pipeline on a selected set of functions
        --dynamic-pipeline=<string>                          - The pipeline description that will run on the filtered function.
        --op-name=<string>                                   - List of function name to apply the pipeline to
        --run-on-nested-operations                           - This will apply the pipeline on nested operations under the visited operation.
        --run-on-parent                                      - This will apply the pipeline on the parent operation if it exist, this is expected to fail.
      --test-elements-attr-interface                         -   Test ElementsAttr interface support.
      --test-eliminate-vector-masks                          -   Test eliminating vector masks
        --vscale-max=<uint>                                  - Maximum possible value of vscale.
        --vscale-min=<uint>                                  - Minimum possible value of vscale.
      --test-emulate-narrow-int                              -   Function pass to test Narrow Integer Emulation
        --arith-compute-bitwidth=<uint>                      - arith computation bit width
        --disable-atomic-rmw                                 - disable atomic read-modify-write and prefer generating normal sequence
        --memref-load-bitwidth=<uint>                        - memref load/store emulation bit width
        --skip-memref-type-conversion                        - disable memref type conversion (to test failures)
      --test-expand-math                                     -   Test expanding math
      --test-extract-fixed-outer-loops                       -   test application of parametric tiling to the outer loops so that the ranges of outer loops become static
        --test-outer-loop-sizes=<long>                       - fixed number of iterations that the outer loops should have
      --test-fold-arith-extf-into-vector-contract-patterns   -   Test patterns that fold arithmetic extension ops into vector contract ops
      --test-foo-analysis                                    -   
      --test-func-erase-arg                                  -   Test erasing func args.
      --test-func-erase-result                               -   Test erasing func results.
      --test-func-insert-arg                                 -   Test inserting func args.
      --test-func-insert-result                              -   Test inserting func results.
      --test-func-set-type                                   -   Test FunctionOpInterface::setType.
      --test-function-pass                                   -   Test a function pass in the pass manager
      --test-generic-ir-block-visitors-interrupt             -   Test generic IR visitors with interrupts, starting with Blocks.
      --test-generic-ir-region-visitors-interrupt            -   Test generic IR visitors with interrupts, starting with Regions.
      --test-generic-ir-visitors                             -   Test generic IR visitors.
      --test-generic-ir-visitors-interrupt                   -   Test generic IR visitors with interrupts.
      --test-gpu-memory-promotion                            -   Promotes the annotated arguments of gpu.func to workgroup memory.
      --test-gpu-rewrite                                     -   Applies all rewrite patterns within the GPU dialect.
      --test-gpu-subgroup-reduce-lowering                    -   Applies gpu.subgroup_reduce lowering patterns.
        --expand-to-shuffles                                 - Expand subgroup_reduce ops to shuffle ops.
        --target=<string>                                    - Target backend name which will be used to provide compatible lowerings of subgroup reduce.
      --test-greedy-patterns                                 -   Run test dialect patterns
        --cse-constants                                      - Whether to CSE constants
        --fold                                               - Whether to fold
        --max-iterations=<int>                               - Max. iterations in the GreedyRewriteConfig
        --top-down                                           - Seed the worklist in general top-down order
      --test-inline                                          -   Test inlining region calls
      --test-inline-callback                                 -   Test inlining region calls with call back functions
      --test-interface-pass                                  -   Test an interface pass (running on FunctionOpInterface) in the pass manager
      --test-ir-visitors                                     -   Test various visitors.
      --test-irdl-conversion-check                           -   Checks the convertability of an irdl dialect
      --test-last-modified                                   -   
        --assume-func-writes                                 - assume external functions have write effect on all arguments
        --interprocedural                                    - perform interprocedural analysis
      --test-lazy-loading                                    -   Test LazyLoading of op
        --bytecode-version=<int>                             - Specifies the bytecode version to use.
      --test-legalize-patterns                               -   Run test dialect legalization patterns
      --test-legalize-type-conversion                        -   Test various type conversion functionalities in DialectConversion
      --test-legalize-unknown-root-patterns                  -   Test public remapped value mechanism in ConversionPatternRewriter
      --test-linalg-data-layout-propagation                  -   Test data layout propagation
      --test-linalg-decompose-ops                            -   Test Linalg decomposition patterns
        --remove-dead-args-and-results                       - Test patterns to erase unused operands and results
      --test-linalg-drop-unit-dims                           -   
      --test-linalg-elementwise-fusion-patterns              -   Test Linalg element wise operation fusion patterns
        --collapse-dimensions-control=<long>                 - Test controlling dimension collapse pattern
        --control-fusion-by-expansion                        - Test controlling fusion of reshape with generic op by expansion
        --fuse-generic-ops                                   - Test fusion of generic operations.
        --fuse-generic-ops-control                           - Test fusion of generic operations with a control function.
        --fuse-multiuse-producer                             - Test fusion of producer ops with multiple uses
        --fuse-with-reshape-by-collapsing                    - Test linalg expand_shape -> generic fusion patterns that collapse the iteration space of the consumer
        --fuse-with-reshape-by-collapsing-control            - Test controlling the linalg expand_shape -> generic fusion patterns that collapse the iteration space of the consumer
        --fuse-with-reshape-by-expansion                     - Test fusion of generic operations with reshape by expansion
      --test-linalg-greedy-fusion                            -   Test Linalg fusion by applying a greedy test transformation.
      --test-linalg-pad-fusion                               -   Test PadOp fusion
      --test-linalg-rank-reduce-contraction-ops              -   Test Linalg rank reduce contraction ops with unit dims
      --test-linalg-transform-patterns                       -   Test Linalg transformation patterns by applying them greedily.
        --loop-type=<string>                                 - Specify the type of loops to generate: for, parallel or tiled_loop
        --peeled-loops=<long>                                - Loops to be peeled when test-tile-pattern
        --skip-partial                                       - Skip loops inside partial iterations during peeling
        --test-bubble-up-extract-slice-op-pattern            - Test rewrite of linalgOp + extract_slice into extract_slice + linalgOp
        --test-decompose-linalg-pack                         - Test transform that generalizes pack ops into a sequence of tensor and Linalg ops
        --test-decompose-pad-tensor                          - Test transform pad tensor by copying with generic ops
        --test-decompose-tensor-unpack                       - Test transform that generalizes unpack ops into a sequence of tensor and Linalg ops
        --test-decompose-winograd-ops                        - Test decompose Winograd ops
        --test-erase-unnecessary-inputs                      - Test patterns to erase unnecessary inputs
        --test-erase-unused-operands-and-results             - Test patterns to erase unused operands and results
        --test-fold-into-pack-and-unpack                     - Test folding ops into linalg.pack and linalg.unpack
        --test-linalg-to-vector-patterns                     - Test a set of patterns that rewrite a linalg contraction in vector.contract form
        --test-patterns                                      - Test a mixed set of patterns
        --test-simplify-pack-unpack-patterns                 - Test patterns to simplify linalg.pack and linalg.unpack
        --test-swap-extract-slice-with-fill-pattern          - Test patterns to swap tensor.extract_slice(linalg.fill())
        --test-swap-subtensor-padtensor                      - Test rewrite of subtensor(tensor.pad) into tensor.pad(subtensor)
        --test-vector-transfer-forwarding-patterns           - Test a fused pass that forwards memref.copy to vector.transfer
        --test-winograd-conv2d                               - Test transform conv2d by Winograd conv2d algorithm
        --tile-sizes=<long>                                  - Linalg tile sizes for test-tile-pattern
      --test-liveness-analysis                               -   
      --test-llvm-legalize-patterns                          -   Run LLVM dialect legalization patterns
      --test-loop-fusion                                     -   Tests loop fusion utility functions.
        --test-loop-fusion-dependence-check                  - Enable testing of loop fusion dependence check
        --test-loop-fusion-slice-computation                 - Enable testing of loop fusion slice computation
        --test-loop-fusion-transformation                    - Enable testing of loop fusion transformation
      --test-loop-permutation                                -   Tests affine loop permutation utility
        --permutation-map=<uint>                             - Specify the loop permutation
      --test-loop-unrolling                                  -   Tests loop unrolling transformation
        --annotate                                           - Annotate unrolled iterations.
        --loop-depth=<uint>                                  - Loop depth.
        --unroll-factor=<ulong>                              - Loop unroll factor.
        --unroll-full                                        - Full unroll loops.
        --unroll-up-to-factor                                - Loop unroll up to factor.
      --test-lower-to-arm-neon                               -   Tests lower to arm Neon.
      --test-make-isolated-from-above                        -   Test making a region isolated from above
        --clone-ops-with-no-operands                         - Test case with cloning of operations with no operands
        --clone-ops-with-operands                            - Test case with cloning of operations with no operands
        --simple                                             - Test simple case with no cloning of operations
      --test-mapping-to-processing-elements                  -   test mapping a single loop on a virtual processor grid
      --test-match-reduction                                 -   Test the match reduction utility.
      --test-matchers                                        -   Test C++ pattern matchers.
      --test-math-algebraic-simplification                   -   Test math algebraic simplification
      --test-math-polynomial-approximation                   -   Test math polynomial approximations
        --enable-avx2                                        - Enable approximations that emit AVX2 intrinsics via the X86Vector dialect
      --test-math-to-vcix                                    -   Test lowering patterns that converts some vector operations to VCIX. Since DLA can implement VCIX instructions in completely different way, conversions of that test pass only lives here.
      --test-memref-bound-check                              -   Check memref access bounds
      --test-memref-dependence-check                         -   Checks dependences between all pairs of memref accesses.
      --test-memref-stride-calculation                       -   Test operation constant folding
      --test-merge-blocks                                    -   Test Merging operation in ConversionPatternRewriter
      --test-mesh-all-slice-op-lowering                      -   Test lowering of all-slice.
      --test-mesh-process-multi-index-op-lowering            -   Test lowering of mesh.process_multi_index op.
      --test-mesh-resharding-spmdization                     -   Test Mesh dialect resharding spmdization.
      --test-mesh-simplifications                            -   Test mesh simplifications
      --test-mlir-reducer                                    -   Tests MLIR Reduce tool by generating failures
      --test-module-pass                                     -   Test a module pass in the pass manager
      --test-multi-buffering                                 -   Test multi buffering transformation
        --multiplier=<uint>                                  - Decide how many versions of the buffer should be created,
      --test-next-access                                     -   
        --assume-func-reads                                  - assume external functions have read effect on all arguments
        --interprocedural                                    - perform interprocedural analysis
      --test-nvgpu-mmasync-f32-to-tf32-patterns              -   Test patterns to convert mma.sync on f32 with tf32 precision
        --precision=<string>                                 - Target nvgpu.mma.sync on f32 input with tf32 or tf32x3 precision
      --test-opaque-loc                                      -   Changes all leaf locations to opaque locations
      --test-operations-equality                             -   Test operations equality.
      --test-options-pass                                    -   Test options parsing capabilities
        --enum=<value>                                       - Example enum option
    =zero                                              -   Example zero value
    =one                                               -   Example one value
    =two                                               -   Example two value
        --list=<int>                                         - Example list option
        --string=<string>                                    - Example string option
        --string-list=<string>                               - Example string list option
      --test-options-super-pass                              -   Test options of options parsing capabilities
        --list=<value>                                       - Example list of PassPipelineOptions option
      --test-pass-crash                                      -   Test a pass in the pass manager that always crashes
      --test-pass-create-invalid-ir                          -   Test pass that adds an invalid operation in a function body
        --emit-invalid-ir                                    - Emit invalid IR
        --signal-pass-failure                                - Trigger a pass failure
      --test-pass-failure                                    -   Test a pass in the pass manager that always fails
        --gen-diagnostics                                    - Generate a diagnostic message
      --test-pass-invalid-parent                             -   Test a pass in the pass manager that makes the parent operation invalid
      --test-pass-state-extension-communication              -   test state communciation between a mlir pass and transform ops
      --test-pattern-selective-replacement                   -   Test selective replacement in the PatternRewriter
      --test-pdl-bytecode-pass                               -   Test PDL ByteCode functionality
      --test-pdll-pass                                       -   Test PDLL functionality
      --test-print-callgraph                                 -   Print the contents of a constructed callgraph.
      --test-print-defuse                                    -   Test various printing.
      --test-print-dominance                                 -   Print the dominance information for multiple regions.
      --test-print-invalid                                   -   Test printing invalid ops.
      --test-print-liveness                                  -   Print the contents of a constructed liveness information.
      --test-print-nesting                                   -   Test various printing.
      --test-print-shape-mapping                             -   Print the contents of a constructed shape mapping information.
      --test-print-topological-sort                          -   Sorts operations topologically and attaches attributes with their corresponding index in the ordering to them
      --test-recursive-types                                 -   Test support for recursive types
      --test-remapped-value                                  -   Test public remapped value mechanism in ConversionPatternRewriter
      --test-return-type                                     -   Run return type functions
      --test-rewrite-dynamic-op                              -   Test rewritting on dynamic operations
      --test-scalar-vector-transfer-lowering                 -   Test lowering of scalar vector transfers to memref loads/stores.
        --allow-multiple-uses                                - Fold transfer operations with multiple uses
      --test-scf-for-utils                                   -   test scf.for utils
        --test-replace-with-new-yields                       - Test replacing a loop with a new loop that returns new additional yield values
      --test-scf-if-utils                                    -   test scf.if utils
      --test-scf-parallel-loop-collapsing                    -   Test parallel loops collapsing transformation
        --collapsed-indices-0=<uint>                         - Which loop indices to combine 0th loop index
        --collapsed-indices-1=<uint>                         - Which loop indices to combine into the position 1 loop index
        --collapsed-indices-2=<uint>                         - Which loop indices to combine into the position 2 loop index
      --test-scf-pipelining                                  -   test scf.forOp pipelining
        --annotate                                           - Annote operations during loop pipelining transformation
        --no-epilogue-peeling                                - Use predicates instead of peeling the epilogue.
      --test-scf-uplift-while-to-for                         -   test scf while to for uplifting
      --test-scf-while-op-builder                            -   test build functions of scf.while
      --test-shape-function-report                           -   Test pass to report associated shape functions
      --test-side-effects                                    -   Test side effects interfaces
      --test-spirv-entry-point-abi                           -   Set the spirv.entry_point_abi attribute on GPU kernel function within the module, intended for testing only
        --subgroup-size=<int>                                - Subgroup size to use for all gpu.func kernels in the module
        --target-width=<int>                                 - Specify the component width of floating-point instructions
        --workgroup-size=<int>                               - Workgroup size to use for all gpu.func kernels in the module, specified with x-dimension first, y-dimension next and z-dimension last. Unspecified dimensions will be set to 1
      --test-spirv-func-signature-conversion                 -   Test patterns that convert vector inputs and results in function signatures
      --test-spirv-module-combiner                           -   Tests SPIR-V module combiner library
      --test-spirv-op-availability                           -   Test SPIR-V op availability
      --test-spirv-target-env                                -   Test SPIR-V target environment
      --test-spirv-vector-unrolling                          -   Test patterns that unroll vectors to types supported by SPIR-V
      --test-stats-pass                                      -   Test pass statistics
      --test-strict-pattern-driver                           -   Test strict mode of pattern driver
        --strictness=<string>                                - Can be {AnyOp, ExistingAndNewOps, ExistingOps}
      --test-symbol-rauw                                     -   Test replacement of symbol uses
      --test-symbol-uses                                     -   Test detection of symbol uses
      --test-take-body                                       -   Test Region's takeBody
      --test-target-materialization-with-no-uses             -   Test a special case of target materialization in DialectConversion
      --test-tensor-copy-insertion                           -   Module pass to test Tensor Copy Insertion
        --allow-return-allocs-from-loops                     - Allows returning/yielding new allocations from a loop.
        --bufferize-function-boundaries                      - Bufferize function boundaries.
        --must-infer-memory-space                            - The memory space of an memref types must always be inferred. If unset, a default memory space of 0 is used otherwise.
      --test-tensor-transform-patterns                       -   Test Tensor transformation patterns by applying them greedily.
        --test-drop-redundant-insert-slice-rank-expansion    - Test dropping redundant insert_slice rank expansions
        --test-expand-shape-bubbling                         - Test folding of expand_shape/collapse_shape
        --test-fold-consecutive-insert-extract-slice         - Test folding consecutive tensor.insert_slice/tensor.extract_slice
        --test-fold-constant-extract-slice                   - Test folding arith.constant and tensor.extract_slice
        --test-reassociative-reshape-folding                 - Test folding of expand_shape/collapse_shape
        --test-rewrite-extract-slice-from-collapse-shape     - Test swapping tensor.extract_slice of a collapse_shape with loop nest
        --test-tracking-listener                             - Test tensor TrackingListener for the transform dialect
        --use-foreach                                        - Use the scf.forall operation when generating loop nests for the extract_slice of collapse_shape pattern
      --test-tensorlike-bufferlike                           -   Module pass to test custom types that implement TensorLike / BufferLike interfaces
      --test-topological-sort-analysis                       -   Test topological sorting of ops
      --test-tosa-op-availability                            -   Test Tosa op availability
      --test-trait-folder                                    -   Run trait folding
      --test-transform-dialect-erase-schedule                -   erase transform dialect schedule from the IR
      --test-type-interfaces                                 -   Test type interface support.
      --test-vector-break-down-bitcast                       -   Test pattern that breaks down vector.bitcast ops 
      --test-vector-break-down-reduction-patterns            -   Test patterns to break down vector reductions into arith reductions
      --test-vector-chained-reduction-folding-patterns       -   Test patterns to fold chained vector reductions
      --test-vector-contraction-prepare-for-mmt-lowering     -   Test vector.contraction matmul canonicalization for MMT lowering.
      --test-vector-emulate-masked-load-store                -   Test patterns that emulate the maskedload/maskedstore op by  memref.load/store and scf.if
      --test-vector-extract-strided-slice-lowering           -   Test lowering patterns that converts vector.extract_strided_slice into a chain of vector.extract and vector.insert ops
      --test-vector-gather-lowering                          -   Test patterns that lower the gather op in the vector conditional loads
      --test-vector-linearize                                -   Linearizes ND vectors for N >= 2 into 1D vectors
      --test-vector-reduction-to-contract-patterns           -   Test patterns to convert multireduce op to contract and combine broadcast/transpose to contract
      --test-vector-reduction-to-spirv-dot-prod              -   Test lowering patterns that converts vector.reduction to SPIR-V integer dot product ops
      --test-vector-scan-lowering                            -   Test lowering patterns that lower the scan op in the vector dialect
      --test-vector-sink-patterns                            -   Test lowering patterns that eliminate redundant broadcast and transpose operations.
      --test-vector-to-vector-lowering                       -   Test lowering patterns between ops in the vector dialect
        --unroll                                             - Include unrolling
      --test-vector-transfer-collapse-inner-most-dims        -   Test lowering patterns that reduces the rank of the vector transfer memory and vector operands.
      --test-vector-transfer-flatten-patterns                -   Test patterns to rewrite contiguous row-major N-dimensional vector.transfer_{read,write} ops into 1D transfers
        --target-vector-bitwidth=<uint>                      - Minimum vector bitwidth to enable the flattening transformation. For scalable vectors this is the base size, i.e. the size corresponding to vscale=1.
      --test-vector-transfer-unrolling-patterns              -   Test lowering patterns to unroll transfer ops in the vector dialect
        --reverse-unroll-order                               - reverse the order of unrolling of vector transfer operations
      --test-vector-transferop-opt                           -   Test optimization transformations for transfer ops
      --test-vector-unrolling-patterns                       -   Test lowering patterns to unroll contract ops in the vector dialect
        --unroll-based-on-type                               - Set the unroll factor based on type of the operation
        --unroll-order=<long>                                - set the unroll order
      --test-vector-warp-distribute                          -   Test vector warp distribute transformation and lowering patterns
        --distribute-transfer-write                          - Test distribution of transfer write
        --hoist-uniform                                      - Test hoist uniform
        --max-transfer-write-elements=<uint>                 - Maximum number of transfer write elements to distribute
        --propagate-distribution                             - Test distribution propagation
        --rewrite-warp-ops-to-scf-if                         - Lower vector.warp_execute_on_lane0 to scf.if op
      --test-verify-uselistorder                             -   Verify that roundtripping the IR to bytecode preserves the order of the uselists
        --rng-seed=<uint>                                    - Specify an input random seed
      --test-walk-pattern-rewrite-driver                     -   Run test walk pattern rewrite driver
        --dump-notifications                                 - Print rewrite listener notifications
      --test-wrap-scf-while-loop-in-zero-trip-check          -   test scf::wrapWhileLoopInZeroTripCheck
        --force-create-check                                 - Force to create zero-trip-check.
      --test-written-to                                      -   
        --assume-func-writes                                 - assume external functions have write effect on all arguments
        --interprocedural                                    - perform interprocedural analysis
      --test-xegpu-unrolling-patterns                        -   Test lowering patterns to unroll ops in the xegpu dialect
      --topological-sort                                     -   Sort regions without SSA dominance in topological order
      --tosa-infer-shapes                                    -   Propagate shapes across TOSA operations
      --tosa-layerwise-constant-fold                         -   Fold layerwise operations on constant tensors
        --aggressive-reduce-constant                         - Always perform the reduce constant optimizationMay add more tosa.const but would reduce runtime calculations
      --tosa-make-broadcastable                              -   TOSA rank Reshape to enable Broadcasting
      --tosa-optional-decompositions                         -   Applies Tosa operations optional decompositions
      --tosa-reduce-transposes                               -   Reduce transposes through other operators
      --tosa-test-quant-utils                                -   TOSA Test: Exercise the APIs in QuantUtils.cpp.
      --tosa-to-arith                                        -   Lower TOSA to the Arith dialect
        --include-apply-rescale                              - Whether to include the lowering for tosa.apply_rescale to arith
        --use-32-bit                                         - Whether to prioritze lowering to 32-bit operations
      --tosa-to-linalg                                       -   Lower TOSA to LinAlg on tensors
        --aggressive-reduce-constant                         - Always perform the reduce constant optimization
        --disable-tosa-decompositions                        - Disable tosa decompositions pass
      --tosa-to-linalg-named                                 -   Lower TOSA to LinAlg named operations
        --prefer-conv2d-kernel-layout-hwcf                   - Prefer generating linalg.conv_2d_nhwc_hwcf over linalg.conv_2d_nhwc_fhwc
      --tosa-to-mlprogram                                    -   Lower TOSA to the MLProgram dialect
      --tosa-to-scf                                          -   Lower TOSA to the SCF dialect
      --tosa-to-tensor                                       -   Lower TOSA to the Tensor dialect
      --tosa-validate                                        -   Validates TOSA dialect
        --allow-invalid-op-datatype-combinations             - Disable checks for operations that are determined to be invalid due to their operand/result datatypes not aligning with the 'Supported Data Types' sections of the specifciation
        --extension=<string>                                 - Validate if operations match for the given extension set
        --level=<value>                                      - Validate if operator parameters are within specfication for the given level
    =8k                                                -   Ranges are expected to be sufficient for applications with frame sizes up to 8K.
    =none                                              -   Allows the full range of arguments specified by the operations according to the operation data types.
        --profile=<string>                                   - Validate if operations match for the given profile set
        --strict-op-spec-alignment                           - Verify if the properties of certain operations align the spec requirement
      --transform-dialect-check-uses                         -   warn about potential use-after-free in the transform dialect
      --transform-infer-effects                              -   infer transform side effects for symbols
      --transform-interpreter                                -   transform dialect interpreter
        --debug-bind-trailing-args=<string>                  - Binds trailing arguments of the entry point to the payload operations with specified names.
        --debug-payload-root-tag=<string>                    - Select the operation with 'transform.target_tag' attribute having the given value as payload IR root. If empty select the pass anchor operation as the payload IR root.
        --disable-expensive-checks                           - Disable expensive checks in the interpreter for a faster run.
        --entry-point=<string>                               - Entry point of the pass pipeline.
      --transform-preload-library                            -   preload transform dialect library
        --transform-library-paths=<string>                   - Optional paths to files with modules that should be merged into the transform module to provide the definitions of external named sequences.
      --view-op-graph                                        -   Print Graphviz visualization of an operation
        --max-label-len=<uint>                               - Limit attribute/type length to number of chars
        --print-attrs                                        - Print attributes of operations
        --print-control-flow-edges                           - Print control flow edges
        --print-data-flow-edges                              - Print data flow edges
        --print-result-types                                 - Print result types of operations
      --xegpu-fold-alias-ops                                 -   Fold alias ops into XeGPU ops
      --xegpu-subgroup-distribute                            -   Distribute XeGPU ops to work items
        --print-analysis-only                                - Print the result of the subgroup map propagation analysis and exit.
      --xegpu-wg-to-sg-distribute                            -   Transform WorkGroup level XeGPU code to SubGroup level
    Pass Pipelines:
      --buffer-deallocation-pipeline                         -   The default pipeline for automatically inserting deallocation operations after one-shot bufferization. Deallocation operations (except `memref.realloc`) may not be present already.
        --private-function-dynamic-ownership                 - Allows to add additional results to private functions to return ownership of returned memrefs to callers. This can avoid spurious buffer clones in the callee.
      --gpu-lower-to-nvvm-pipeline                           -   The default pipeline lowers main dialects (arith, memref, scf, vector, gpu, and nvgpu) to NVVM. It starts by lowering GPU code to the specified compilation target (default is fatbin) then lowers the host code.
        --cubin-chip=<string>                                - Chip to use to serialize to cubin.
        --cubin-features=<string>                            - Features to use to serialize to cubin.
        --cubin-format=<string>                              - Compilation format to use to serialize to cubin.
        --cubin-triple=<string>                              - Triple to use to serialize to cubin.
        --host-bare-ptr-calling-convention                   - Whether to use the bareptr calling convention on the host (warning this should be false until the GPU layering is fixed)
        --index-bitwidth=<long>                              - Bitwidth of the index type for the host (warning this should be 64 until the GPU layering is fixed)
        --kernel-bare-ptr-calling-convention                 - Whether to use the bareptr calling convention on the kernel (warning this should be false until the GPU layering is fixed)
        --opt-level=<int>                                    - Optimization level for NVVM compilation
        --ptxas-cmd-options=<string>                         - Command line options to pass to the downstream compiler.
      --sparsifier                                           -   The standard pipeline for taking sparsity-agnostic IR using the sparse-tensor type, and lowering it to LLVM IR with concrete representations and algorithms for sparse tensors.
        --create-sparse-deallocs                             - Specify if the temporary buffers created by the sparse compiler should be deallocated. For compatibility with core bufferization passes. This option is only used when enable-runtime-library=false.
        --enable-amx                                         - Enables the use of AMX dialect while lowering the vector dialect
        --enable-arm-neon                                    - Enables the use of ArmNeon dialect while lowering the vector dialect
        --enable-arm-sve                                     - Enables the use of ArmSVE dialect while lowering the vector dialect
        --enable-buffer-initialization                       - Enable zero-initialization of memory buffers
        --enable-gpu-libgen                                  - Enables GPU acceleration by means of direct library calls (like cuSPARSE)
        --enable-index-optimizations                         - Allows compiler to assume indices fit in 32-bit if that yields faster code
        --enable-runtime-library                             - Enable runtime library for manipulating sparse tensors
        --enable-x86vector                                   - Enables the use of X86Vector dialect while lowering the vector dialect
        --gpu-chip=<string>                                  - GPU target architecture
        --gpu-features=<string>                              - GPU target features
        --gpu-format=<string>                                - GPU compilation format
        --gpu-triple=<string>                                - GPU target triple
        --parallelization-strategy=<value>                   - Set the parallelization strategy
    =none                                              -   Turn off sparse parallelization.
    =dense-outer-loop                                  -   Enable dense outer loop sparse parallelization.
    =any-storage-outer-loop                            -   Enable sparse parallelization regardless of storage for the outer loop.
    =dense-any-loop                                    -   Enable dense parallelization for any loop.
    =any-storage-any-loop                              -   Enable sparse parallelization for any storage and loop.
        --reassociate-fp-reductions                          - Allows llvm to reassociate floating-point reductions for speed
        --sparse-emit-strategy=<value>                       - Emit functional code or interfaces (to debug) for sparse loops
    =functional                                        -   Emit functional code (with scf.for/while).
    =sparse-iterator                                   -   Emit (experimental) loops (with sparse.iterate).
    =debug-interface                                   -   Emit non-functional but easy-to-read interfaces to debug.
        --test-bufferization-analysis-only                   - Run only the inplacability analysis
        --vl=<int>                                           - Set the vector length (0 disables vectorization)
      --test-composite-fixed-point-pass                      -   Test composite pass
      --test-lower-to-arm-sme                                -   An example pipeline to lower operations on vectors (arith, vector) to LLVM via ArmSME.
        --dump-tile-live-ranges                              - Dump the live ranges of SME tiles (for debugging)
        --fuse-outer-products                                - Fuse outer product operations via '-arm-sme-outer-product-fusion' pass
      --test-lower-to-llvm                                   -   An example of pipeline to lower the main dialects (arith, linalg, memref, scf, vector) down to LLVM.
        --reassociate-fp-reductions                          - Allow reassociation og FP reductions
      --test-options-pass-pipeline                           -   Parses options using pass pipeline registration
        --enum=<value>                                       - Example enum option
    =zero                                              -   Example zero value
    =one                                               -   Example one value
    =two                                               -   Example two value
        --list=<int>                                         - Example list option
        --string=<string>                                    - Example string option
        --string-list=<string>                               - Example string list option
      --test-options-super-pass-pipeline                     -   Parses options of PassPipelineOptions using pass pipeline registration
        --super-list=<value>                                 - Example list of PassPipelineOptions option
      --test-pm-nested-pipeline                              -   Test a nested pipeline in the pass manager
      --test-spirv-cpu-runner-pipeline                       -   Runs a series of passes for lowering SPIR-V-dialect MLIR to LLVM-dialect MLIR intended for SPIR-V CPU Runner tests.
      --test-textual-pm-nested-pipeline                      -   Test a nested pipeline in the pass manager
      --test-vulkan-runner-pipeline                          -   Runs a series of passes intended for Vulkan runner tests. Lowers GPU dialect to LLVM dialect for the host and to serialized Vulkan SPIR-V for the device.
        --spirv-webgpu-prepare                               - Run MLIR transforms used when targetting WebGPU
      --tosa-to-linalg-pipeline                              -   The default pipeline for converting TOSA operators to the equivalent operations using the tensor operations in LinAlg as well as LinAlg named operations.
  --test-legalize-mode=<value>                               - The legalization mode to use with the test driver
    =analysis                                                -   Perform an analysis conversion
    =full                                                    -   Perform a full conversion
    =partial                                                 -   Perform a partial conversion
  --verify-diagnostics                                       - Check that emitted diagnostics match expected-* lines on the corresponding line
  --verify-diagnostics=<value>                               - Check that emitted diagnostics match expected-* lines on the corresponding line
    =all                                                     -   Check all diagnostics (expected, unexpected, near-misses)
    =<empty>                                                 -   Check all diagnostics (expected, unexpected, near-misses)
    =only-expected                                           -   Check only expected diagnostics
  --verify-each                                              - Run the verifier after each transformation pass
  --verify-region-info                                       - Verify region info (time consuming)
  --verify-roundtrip                                         - Round-trip the IR after parsing and ensure it succeeds
  --vp-counters-per-site=<number>                            - The average number of profile counters allocated per value profiling site.
  --vp-static-alloc                                          - Do static counter allocation for value profiler
  --wholeprogramdevirt-cutoff=<uint>                         - Max number of devirtualizations for devirt module pass

Generic Options:

  --help                                                     - Display available options (--help-hidden for more)
  --help-list                                                - Display list of available options (--help-list-hidden for more)
  --version                                                  - Display the version of this program

IR2Vec Options:

  --ir2vec-arg-weight=<number>                               - Weight for argument embeddings
  --ir2vec-opc-weight=<number>                               - Weight for opcode embeddings
  --ir2vec-type-weight=<number>                              - Weight for type embeddings
  --ir2vec-vocab-path=<string>                               - Path to the vocabulary file for IR2Vec